25
Introduction to Social Network Analysis with R Kayleigh Bohemier and Breanne Chryst April 22, 2016

Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Introduction to Social Network Analysis with R

Kayleigh Bohemier and Breanne Chryst

April 22, 2016

Page 2: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Outline

IntroductionIntroduction to Network DataData Format and Gathering

Basic Social Network Analysis (SNA) in RSoft Intro to RNetwork VisualizationsNetwork Statistics

Try it out

2 of 25

Page 3: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Introduction

Page 4: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Vertices/Nodes and Edges

I Wikipedia defines Social Networks as ”a social structure made upof a set of social actors (such as individuals or organizations), setsof dyadic ties, and other social interactions between actors.”

I In network parlance the social actors (people, groups, nations) arenode or vertices and the ties between them are also called edges.

4 of 25

Page 5: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Directed and Undirected

I Directed graphs have edges originating from one node and endingin another. An example is twitter, if I follow you, there is an edgeoriginating from me and ending at you, but you may not follow meback.

I Undirected graphs have reciprocal edges. An example is Facebook,if I am your friend, you are also my friend.

I The decision of whether a network is directed or undirected is animportant one and can have major implications for the analysis.Careful thought should be spent in deciding on this point.

5 of 25

Page 6: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Adjacency Matrices

I The data for a social network can be organized as a matrix, withnon-zero entries in the i , j th entry of the i th node shares an edgewith the j th node.

I If the network is undirected this matrix is symmetric.

I If the edges have weights, the entries in the non-zero entries arethe weights for the edges.

I If the edges are not weighted, the non-zero entries are simply one.

6 of 25

Page 7: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Adjacency Matrices

7 of 25

Page 8: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Edgelists

I Data can be stored as an edgelist, specifying the nodes that have atie between them in the graph.

I If the network is directed the first entry is the start of the edgeand the second is the end of the edge (i.e. there is an arrowconnecting the two nodes pointing in the direction of the second).

I For undirected networks there is only one entry per connection andthe order doesn’t matter.

8 of 25

Page 9: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Network Graph

Bert

Ernie

AdamEve

Batman

Joker

Sherlock

Watson

Robin

Ivy

Moriarty

9 of 25

Page 10: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Publicly Available Datasets

I UC Irvine has a nicely curated list of publicly available networkdata at https://networkdata.ics.uci.edu/resources.php

I Stanford Large Network Dataset Collection:https://snap.stanford.edu/data/

I Of course there are many more resources throughout the internet.

10 of 25

Page 11: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

API

I Poplication Program Interface (API) allow users to interact withprograms. Facebook and Twitter are two of the main sources thatcome to mind when thinking about social network data. You maybe able to access some data from these sites through their API.

I Twitter: https://dev.twitter.com/overview/apiI Facebook:

https:

//developers.facebook.com/docs/graph-api/overview

11 of 25

Page 12: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

From the Field

I Gather your own social network data through observations,literature, etc.

I When gathering your own data it is important to think about whatconstitutes a node and an edge in your data, before you startcollecting the data.

12 of 25

Page 13: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Basic Social Network Analysis(SNA) in R

Page 14: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Intro to R

I R is both a statistical package (like SPSS, Stata, SAS, etc.) and aprogramming language.

I More accurately, R is an environment within which you can dostatistical programming and graphics.

I R provides an environment and a language that allow you tocompute, analyze, and graph your data - and so much more.

I R is an object-oriented programming language.

I Everything you do and create in R, including statistical models,functions, graphics, and output, is an object stored in memorythat can be manipulated, saved, and reused for greater efficiencyand power.

14 of 25

Page 15: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Brief Intro in R

I Lets try out some R!

15 of 25

Page 16: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Reading network Data into R

I Reading network data into R depends on the data file type.

I read.csv can be used for reading in .csv files

I read.table can be used for reading in .txt files

I For other file types, you may need packages like foreign

16 of 25

Page 17: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Local Network Statistics

I Centrality: a measure of importance for vertices in the network.

I Common measures of centrality include degree, closeness,betweeness, and eigenvector centrality.

17 of 25

Page 18: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Degree

I The degree is the number of edges for a node (number of alters orsocial connections in the network).

3

3

4

3 2

2

3

3

2

1

18 of 25

Page 19: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Closeness

I Closeness is the number of nodes ”close” in geodesic distance orshortest path between vertices.

0.04

0.04

0.06

0.06 0.05

0.04

0.06

0.04

0.04

0.03

19 of 25

Page 20: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Betweeness

I Betweeness measures the extent to which the vertex connectsdisparate groups.

0.83

2.1713.67

20.83

2.5

0

18

8

0

0

20 of 25

Page 21: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Eigenvector

I Eigenvector centrality uses the first eigenvector to create acentrality measure that is a function of its neighbors centrality.

0.84

0.81

0.640.5

0.63

0.37

0.22

0.2

0.08

21 of 25

Page 22: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Global Network Statistics

I Average Shortest Path: the average shortest path between verticesin the graph.

I Connectivity/Density: the proportion of the possible edges that arerealized in the graph.

I Average Degree: mean of the degrees in the network.

22 of 25

Page 23: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Global Network Statistics

I Degree Distribution: the distributions of degrees for the network.this summarizes the connectivity of the network.

1 2 3 4 5 6

Degree Distribution for Kite Network

Degree

Freq

uenc

y

0.0

0.5

1.0

1.5

2.0

2.5

3.0

01

23

45

Degree Distribution for Partner Network

Degree

Freq

uenc

y

3 4 5 9

23 of 25

Page 24: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Global Network Statistics

I Clustering Coefficient/Transitivity: the number of connectedtriangles divided by the number of connected pairs in the graph.

I Partitioning: refers to separating the graph into k disjoint sets ofvertices, usually based on the connectivity within partitions.

I Assortivity/Homophily: describes the tendency for ”birds of afeather flock together.”

24 of 25

Page 25: Introduction to Social Network Analysis with Rblc3/SNA2016/Intro_SNA_Presentation.pdfOutline Introduction Introduction to Network Data Data Format and Gathering Basic Social Network

Try it out