Upload
kingsbsd
View
71
Download
0
Embed Size (px)
Citation preview
Our Data, Ourselves
Research on Online Digital Cultures — Community Extraction from Twitter Networks by Markov Clustering
Department of Digital Humanities
Giles Greenway
Tobias Blanke
Jenifer Pybus
Mark Cote
A “mobile-data commons”?
• Can we write an app to capture the data-trails that smartphones transmit to third parties and make them available?
• NO.• This would require rooting the 'phones. An
Android phone is a Linux system, where the end user typically doesn't have admin rights.
• If the app reaches a mass audience, we cannot expect users to root their phones. Some rooting software contains malware, we cannot ensure that users root their devices safelyhttp://tinyurl.com/weidmandroid
A “mobile-data commons”?
20 young coders from Young RewiredState (YRS) were issued with Android smartphones.
'Phones were pre-loaded with our “MobileMiner” app, that logs app network traffic, GSM cells, app notifications and WIFI network connections.
The data is logged by a CKAN server, and also made available to users on their devices.
Twitter accounts were also scraped.
Is net activity a proxy for app usage?
Sometimes not...
Some apps use analytics / ad services continually.Provoked a workshop on app reversal and network traffic capture. http://kingsbsd.github.io/DroidDestructionKit
Notifications as a proxy for social network usage.
0 200 400 600 800 1000 12000
200
400
600
800
1000
1200Twitter Network Degree vs Notifications
Friends
Followers
Number of Notifications
frie
nd
s / f
ollo
we
rs c
ou
nt
Twitter sends notifications based on people you follow. The more notifications the more friends.
Questions to ask of Twitter
●How many different “tribes” does the average teenage hacker have?●What do they Tweet about?●Do they use it conversationally? What's the distribution of lengths of chains of tweets and replies?
Need a community-detection algorithm:●Easy to implement.●Can be explained to non-technical cultural-studies academics in three slides!●Returns realistic communities.
Markov Clustering -MCL
● There are clusters of Twitter users with densely connected networks of friend/follower relationships.
• If you take a random walk around the network, you are likely to stay within the cluster you started in.
http://www.micans.org/mcl/
MCL -A Trivial Example
1: Build an adjacency matrix for the graph.2: Normalize the columns to produce transition probabilities.
MCL -A Trivial Example
4: Element-wise square the matrix and re-normalize.5: Rinse and repeat until convergence.
The matrix entries will be 0 or 1. Interpret rows as: “If I'm in this row node, which column nodes are credible start-points?”
MCL -Does it work?
MCL was applied to two Twitter accounts of digital culture researchers with ~7000 once-removed friend-follower relationships.
Gephi's “OpenOrd” layout is meant to emphasise clusters. Are nodes in the same cluster close together?
Compare with Gephi's own “modularity algorithm”, the Louvain method.
MCL -Does it work?
Louvain: Twitter accounts in the same cluster are placed close together.
MCL: Accounts in the same cluster are scattered.
This suggests that Louvain performed better than MCL.
MCL -Does it work?
Louvain: Twitter accounts in the same cluster are placed close together.
MCL: Accounts in the same cluster are scattered.
This suggests that Louvain performed better than MCL.
WRONG!
MCL -Does it work?
Why did Gephi/Louvain put these two in the same modularity class / cluster?
MCL LouvainCluster is identifiable and relevant.
20% 0% !Cluster is not identifiable, but possibly relevant.
37%Cluster is neither identifiable or relevant.
43%
Researchers rated clusters for both methods.
MCL -Does it work?
Why did the Louvain method perform poorly?-The Louvain method works by combining smaller clusters to maximize modularity. Does the very high degree of Twitter networks harm its performance? One wrongly-placed Twitter account pulls in many others.
Why was the OpenOrd layout misleading?-Both OpenOrd and Louvain work by combining smaller clusters. Both are vulnerable to the same problems.
MCL -Does it work?
MCL can suggest plausible Twitter communities.Can it find pre-existing ones?Repeat for the YRS volunteers:
MCL Louvain
MCL -Does it work?
Do the Twitter accounts of 9 YRS volunteers end up in the same cluster?MCL: Mostly...
Cluster Size 20 26 6 6 5 45 5 319 6 5 14 14 5
YRS accounts 0 0 0 0 0 1 0 8 0 0 0 0 0
Louvain: Not so much...
Cluster Size 15 78 7 43 168 67 55 230 24
YRS accounts 0 1 0 0 0 3 2 3 0
[ ~4% probability of allocating 8 Twitter users to the largest MCL cluster by chance. ]
Is inferring from layouts always problematic?
-Of course not!
Th
Theban scribes with common contracting parties
Source: Silke Vanbeselaere http://tinyurl.com/thebanscribes
What do the clusters tweet about?
Top tags for the MCL clusters:Cluster Size 6 45 319
YRS accounts 0 1 8
Top tags dotnetnotts, 18TechNott, 10NottsTest, 8JavaScript, 2hack24, 2ukbestworkplace, 2
GE2015, 78Eurovision, 2015 58leadersdebate, 33bdw2015, 24BattleForNumber10, 21BBCQT, 18GBR, 15bbcqt, 14eurovision, 14NHTG15, 13FoC2015, 12YRSAmbassadors, 11depop, 11BBCFreeSpeech, 10VoteConverative, 9YRS2014, 9DimblebyLecture, 9endpointcon, 9
GE2015, 275tech, 214jobs, 207YRS2014 185Haunted, 183ghosts, 183YRSFoc, 181hackmcr, 167,yrs2014, 156Arduino, 149FoC2015, 141Norwich, 133gamedev, 132TG, 130BigData, 112linux, 111YRSHyperlocal, 105design, 99
Conclusions:
● Acquire Twitter data with Twython/Celery/Redis/RabbbitMQ.
● Store Twitter data with: Neo4J/Py2Neo.● Perform MCL with NumPy.● Export to Gephi with NetworkX.
● Gephi and the Louvain method are fine tools, use them carefully!
● MCL is very effective (if slow) at extracting Twitter communities.
● Numerical techniques should be easy to justify and validate.
● Visualizations are powerful, persuasive, and sometimes misleading! (“Beware of geeks bearing .gifs!”)
The tools:
Download our app: http://kingsbsd.github.io/MobileMiner
Follow us on Twitter: @KingsBSD
Read our blog:http://big-social-data.net/
Read about our data:http://tinyurl.com/miningmobileyouth
Slideshare:http://www.slideshare.net/kingsBSD/