16
NETWORK ANALYSIS: PEOPLE AND COMMUNITIES Dawn M. Foster @geekygirldawn [email protected] fastwonderblog.com PhD Student University of Greenwich London, UK

Network Analysis: Tech Evangelism London Meetup

Embed Size (px)

Citation preview

NETWORK ANALYSIS: PEOPLE AND

COMMUNITIESDawn M. Foster

@geekygirldawn  [email protected]  fastwonderblog.com

PhD  Student  University  of  Greenwich  

London,  UK

WHOAMI

• Geek, traveler, reader

• 20 year tech career. Past 15 years doing community & open source (Intel, Jive, Puppet Labs, etc.)

• PhD student at University of Greenwich researching Linux kernel

Photos by Josh Bancroft, Don Park

WHAT IS NETWORK ANALYSIS?

Studies relationships

between units and looks for

patterns and structure in

those relationshipsImage from ANAMIA Project

AGENDA AND INFO

• Gathering your data

• Data manipulation for network analysis

• Visualization

• What else can you do?Image from a Northern Marina Islands Network

Scripts, Data, and More:github.com/geekygirldawn/linuxcon_2015

I 💖 METRICS GRIMOIRE

MailingListStats aka MLStats

CVSAnalY - repos

Bicho - bugs

More

Photo by Bitergia

http://metricsgrimoire.github.io/

MLSTATS

a) Install mlstats

$ python setup.py install

b) Create database

mysql> create database mlstats;

c) Import data by running mlstats

$ mlstats --db-user=USERNAME --db-password=PASS http://URLOFYOURLIST

MLSTATS: EXTRACT DATA

SELECT mp.email_address AS sender, (SELECT mp2.email_address FROM messages m2, messages_people mp2 WHERE m2.is_response_of=m.is_response_of AND mp2.message_id=m2.is_response_of limit 1) AS receiver FROM messages_people mp, messages m WHERE YEAR(m.first_date)=2015 AND MONTH(m.first_date)=1 AND mp.message_id=m.message_id;

people sending emails

subquery: who they replied to

limit time

for manageable

data

Network Analysis Output for R / Visone: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] ...

EXTRACT DATA: SCRIPTS

Reformat / clean up data

Reproducible

Reduce human error

linuxcon.py scriptImage from Mark Grealish

github.com/geekygirldawn/linuxcon_2015

R / VISONE / GOURCE

Convert data for better use with network analysis

Visualize data usingRStudio, Visone, and Gource

Image from WebOps.com

WHAT ELSE?

So many visualization tools

Python network packages

Other data sources / APIs

Network analysis is more than just pretty pictures!

Dawn FosterUniversity of Greenwich

Centre for Business Network Analysiswww2.gre.ac.uk/about/faculty/business/research/centres/cbna/home

@geekygirldawn, [email protected]

THANK YOU

BACKUPStuff I don't have time to cover,

but that you might find interesting.

GOURCE CUSTOM FORMAT

Pipe Separated File

timestamp - A unix timestamp of when the update occured.username - The name of the user who made the update.type - Update type - (A)dded, (M)odified or (D)eleted.file - Path of the file. color - Color for the file in hex (FFFFFF) format (Optional)

Examples:

1275543595|andrew|A|src/main.cpp 1275543700|bob|M|src/main.cpp

https://github.com/acaudwell/Gource/wiki/Custom-Log-Format

EXAMPLE:

a) Extract data using mlstats / database queries

b) Generate Gource custom format (pipe sep file)

unixtime|user-email_sender|A|new unixtime|user-email_sender|M|user-in_response_to

OR) Run linuxcon.py from my linuxcon_2015 repo (a & b)

c) Run Gource

$ gource -i 10 --max-user-speed 100 -a 1 --highlight-users gource_output.log

github.com/geekygirldawn/linuxcon_2015

OTHER OPTIONS

Bug data

Wikis

Other stuff

https://github.com/acaudwell/Gource/wiki/Custom-Log-FormatPhoto by Bitergia