Upload
graphistry
View
446
Download
2
Embed Size (px)
Citation preview
1
Visually Analyzing People
Leo Meyerovich (@LMeyerov)CEO
is:Supercharging visual analytics through GPU cloud streaming.
(We tricky graphs.)
CASE STUDY: TWITTER FRAUD
Naïve layout on 1K+ node graphs give impenetrable hairballs.
Gauss-Seidel Force-Directed Graph, O(N^2) n-body, GPU
Node: Twitter accountEdge: Friendship
Friends and friend-of-friends of a bot who randomly messaged real people and retweeted them.
Even on a small graph (77 nodes), smart design starts adding clarity
With smart layouts, fake account clusters pop outForceAtlas2 Layout, O(n log n) n-body, GPU
The spambot is an entrypointto more bots…
Obviously fakeaccount names
A quiet small business who buys virtual game currency from
gamers…
Who somehow got exactly 1 message massively
trended & advertised by Twitter
spammer
laundering
accounts
bot retweet network
It’s a “retweet laundering” botnet! Tricks Twitter into targeting gamers
to check out a cyberfraud site.They steal gamers’ money and
identities.
Relationships hard to see without graphs with smart layouts & interactions.
Next step: explore the time dimensionEx: how do mobs launch from Twitter?
11Leo A. Meyerovich, @lmeyerov, GraphistryAriel S. Rabkin, @asrabkin, Cloudera
THE SOCIOLOGY
OFPROGRAMMIN
GLANGUAGESadoption
http://hammerprinciple.com/therighttool
~14,000 developers
Fastest? C > Java> JavaScript > PascalSafest? Java > Pascal > JavaScript > C
Goal: Rank Beliefs
Programmers won’t agree on
ranking..
Idea: Chess Ranking
Let’s run a competition for the friendliest language! (Glicko2)
Each survey response is a game match:1. Person A says Python beats C in
friendliness2. Person A says Java beats C in
friendliness3. Person B says C beats APL in
friendliness …
Score Points set by a BookieEvery language starts with rank 1000
1. “Person A: Python friendlier than C” Python’s rank goes up
2. “Person B: Python friendlier than C” Python already > C, less valuable win
3. “Person C: Haskell friendlier than Python”Problem: little known about Haskell (“sparse”) Haskell beat a high-rank language: big level increase!
(Bayesian!)
Many Tournaments = Correlation Matrix!
Language x Belief
Cluster (K-Means)
Reduce Dimensionality: Pick fun languages & cluster centers
Graphs are (Adjacency) Matrices
Correlation Matrices are Fuzzy Graphs
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5 0.5 0.5
0.5
0.5
0.5 0.5
0.5 0.5
Weak Edges Are Annoying!
Filter: Only Show Strong Relationships
Relationships hard to see without graphs with smart layouts & interactions.
Step 2 of analysis is correlate (step 1 is count).
Correlations are relationships, so explore them as graphs!
26
Projects (2000-2010)200K[PLATEAU 2013]
-20%
0%
20%
40%
60% Java
Project categories (223)
Popu
larity
0%
1%
2%
3%
4%
Scheme
Project categories (223)
Popularity Across Niches
27
blogging
search
build tools
28
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
PrologVBScript SchemeFortran
PL/SQL AssemblyC#
Java
Dispersion across niches(σ / μ)
Popu
lari
tyPopularity vs. Niche: Dispersion
29
0.127 1.27 12.7 1270.0100%
0.1000%
1.0000%
10.0000%
100.0000%
Language Rank (Decreasing )
Propor-tion of Projects
for Lan-
guage
Language Use (survey)Java: winner takes all
Long TailDesign for
nichesand grow
30
Survey of 1,679 Developers
Extrinsic factorsdominate!
(on last project)
FUTURE STEP: Now that we’ve counted things, let’s correlate
them!
Topics in Free-form ResponsesAnswer Correlations
Relationships hard to see without graphs with smart layouts & interactions.
Step 2 of analysis is correlate (step 1 is count).
Correlations are relationships, so explore them as graphs!
Powerful because correlations everywhere:
raw features, inferred topics, …