View
1.097
Download
0
Category
Tags:
Preview:
DESCRIPTION
A guest lecture in the Master elective "The Blind Spot: Tracking Young Media Users" by Susanne Baumgartner
Citation preview
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
#bigdata in Communication Science
Some examples from researchby me and my students
Damian Trilling
d.c.trilling@uva.nl@damian0604
www.damiantrilling.net
Afdeling CommunicatiewetenschapUniversiteit van Amsterdam
October 2013#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
1 What’s big data?
2 Some examplesRare eventsTone in tweetsCounting words and n-gramsNetwork analysis
3 Problems
4 A glimpse in the kitchen
5 Questions?
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
What’s big data?
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
What’s big data?
No definition, but . . .
• Existing data• Too big to code manually• Sometimes also too big to handle with normal tools• New research questions• Call to revisit the relationship between theory and empiricalresearch
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
What’s big data?
Some sources
• Social Network Sites• RSS-feeds• Databases• Scraping text from the web• . . .
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
It’s out there!You only have to collect it.
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Some examples
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Rare events
A recent master thesis
Rare events
Imagine you want to analyze some very rare content.Normal sampling won’t work, that’s for sure.
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Rare events
A recent master thesis
Rare eventsImagine you want to analyze some very rare content.
Normal sampling won’t work, that’s for sure.
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Rare events
A recent master thesis
Rare eventsImagine you want to analyze some very rare content.Normal sampling won’t work, that’s for sure.
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Rare events
So you’d better collect everything first
Getting all news coverage from Dutch news sites
We collected all articles from nine news sites during a period oftwo months, resulting in a database with 74.000 articles.In a second step, we filtered those articles containing specifickeywords. Those 292 articles where then manually coded.
Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Rare events
So you’d better collect everything first
Getting all news coverage from Dutch news sitesWe collected all articles from nine news sites during a period oftwo months, resulting in a database with 74.000 articles.
In a second step, we filtered those articles containing specifickeywords. Those 292 articles where then manually coded.
Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Rare events
So you’d better collect everything first
Getting all news coverage from Dutch news sitesWe collected all articles from nine news sites during a period oftwo months, resulting in a database with 74.000 articles.In a second step, we filtered those articles containing specifickeywords. Those 292 articles where then manually coded.
Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Rare events
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Rare events
It’s just one line of code!
url.txthttp://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehnehttp://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermann-bittet-um-verzeihunghttp://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierung-will-zuruecktretenhttp://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klage-gegen-republikhttp://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafe-wegen-oelpesthttp://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-kein-babybauch-nur-fast-food. . .. . .. . .
wget-commandowget -i urls.txt
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Tone in tweets
A recent bachelor thesis
Tone in tweets
Imagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.Do you really want to go through thousands of tweets by hand?
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Tone in tweets
A recent bachelor thesis
Tone in tweetsImagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.
Do you really want to go through thousands of tweets by hand?
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Tone in tweets
A recent bachelor thesis
Tone in tweetsImagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.Do you really want to go through thousands of tweets by hand?
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Tone in tweets
So you’d better think about automating your coding
Finding out how negative or positive politicians are towardsthere opponents
We took lists with positive and negative words and with apolitician’s opponents.We used a Python-script to check which type of words were usedto refer to opponents.For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Tone in tweets
So you’d better think about automating your coding
Finding out how negative or positive politicians are towardsthere opponentsWe took lists with positive and negative words and with apolitician’s opponents.
We used a Python-script to check which type of words were usedto refer to opponents.For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Tone in tweets
So you’d better think about automating your coding
Finding out how negative or positive politicians are towardsthere opponentsWe took lists with positive and negative words and with apolitician’s opponents.We used a Python-script to check which type of words were usedto refer to opponents.
For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Tone in tweets
So you’d better think about automating your coding
Finding out how negative or positive politicians are towardsthere opponentsWe took lists with positive and negative words and with apolitician’s opponents.We used a Python-script to check which type of words were usedto refer to opponents.For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Tone in tweets
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Tone in tweets
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Counting words and n-grams
How often are specific expressions used?
Counting words and n-grams
Imagine you want to know which words or expressions dominate adiscourse .There are plenty of possibilities to get an answer within minutes,here’s one:
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Counting words and n-grams
How often are specific expressions used?
Counting words and n-gramsImagine you want to know which words or expressions dominate adiscourse .
There are plenty of possibilities to get an answer within minutes,here’s one:
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Counting words and n-grams
How often are specific expressions used?
Counting words and n-gramsImagine you want to know which words or expressions dominate adiscourse .There are plenty of possibilities to get an answer within minutes,here’s one:
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Counting words and n-grams
Again, just one or two lines of code!
For example with STATA
• Install the package wordscore (net installhttp://www.tcd.ie/Political_Science/wordscores/wordscores)
• voor wordcounts: wordfreq /home/dami/texts/lab92.txt/home/dami/texts/lab97.txt
• voor ngrams (trigrams in dit geval): phrasefreq 3 lab92.txtlab97.txt
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Counting words and n-grams
trigrams in Obama-Tweets
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Network analysis
Another approach
Network analysis
Imagine you want to know who talks to whom and how networksare interconnected .Use a tool like NodeXL or Gephi!
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Network analysis
Another approach
Network analysisImagine you want to know who talks to whom and how networksare interconnected .
Use a tool like NodeXL or Gephi!
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Network analysis
Another approach
Network analysisImagine you want to know who talks to whom and how networksare interconnected .Use a tool like NodeXL or Gephi!
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Network analysis
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Problems
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Problems
You sometimes depend entirely on commercial parties
• Services can shut down (GoogleReader) or change their API(Twitter)
• It’s rather easy to get (up to 3200) tweets from a specific user(e.g., allmytweets.net), but if you want to capture a#hashtag, you have to record it live
• Twitter doesn’t give you all tweets, but just about 1% (+ abunch of other limits)
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Problems
You sometimes depend entirely on commercial parties
• Services can shut down (GoogleReader) or change their API(Twitter)
• It’s rather easy to get (up to 3200) tweets from a specific user(e.g., allmytweets.net), but if you want to capture a#hashtag, you have to record it live
• Twitter doesn’t give you all tweets, but just about 1% (+ abunch of other limits)
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Problems
You sometimes depend entirely on commercial parties
• Services can shut down (GoogleReader) or change their API(Twitter)
• It’s rather easy to get (up to 3200) tweets from a specific user(e.g., allmytweets.net), but if you want to capture a#hashtag, you have to record it live
• Twitter doesn’t give you all tweets, but just about 1% (+ abunch of other limits)
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Problems
You sometimes depend entirely on commercial parties
• Services can shut down (GoogleReader) or change their API(Twitter)
• It’s rather easy to get (up to 3200) tweets from a specific user(e.g., allmytweets.net), but if you want to capture a#hashtag, you have to record it live
• Twitter doesn’t give you all tweets, but just about 1% (+ abunch of other limits)
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Problems
Not sure if this a problem or a great opportunity. . .You cannot rely (only) on ready-made software but shout get readyto use tools like bash-scripts, grep, python, . . . (Which can be fun!)
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
A glimpse in the kitchen
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
What I’m doing right now
Analyzing #tvduell
• 570.000 tweets• Identifyig clusters of nouns, verbs and adjectives• Assigning positivity and negativity scores to tweets• See if they can be interpreted as frames
⇒How are Merkel and Steinbrück framed on the Second Secreenduring the debate?
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Something you can use?
1 What’s big data?
2 Some examplesRare eventsTone in tweetsCounting words and n-gramsNetwork analysis
3 Problems
4 A glimpse in the kitchen
5 Questions?
#bigdata Damian Trilling
What’s big data? Some examples Problems A glimpse in the kitchen Questions?
Vragen of opmerkingen?
Damian Trilling
d.c.trilling@uva.nl@damian0604
www.damiantrilling.net
#bigdata Damian Trilling
Recommended