43
Social Network Analysis Inforte course on Big Social Data Analytics 2017 Dr. Jari Jussila Twitter: @jjussila Email: [email protected] GitHub: https://github.com/jjussila/BigSocialDataAnalytics

Big social data analytics - social network analysis

Embed Size (px)

Citation preview

Page 1: Big social data analytics - social network analysis

Social Network Analysis

Inforte course on Big Social Data Analytics 2017Dr. Jari Jussila

Twitter: @jjussilaEmail: [email protected]

GitHub: https://github.com/jjussila/BigSocialDataAnalytics

Page 2: Big social data analytics - social network analysis

WEB

MOBILE AND SOCIAL MEDIA

ERP

CRM

Purchase & Transaction Records

Offers and Quotations

Customer Engagements

A/B Testing

Dynamic Pricing

Search Engine Marketing and Optimization

Target Marketing

Images and Videos

Speech to Text

Sensor Data

Application Log Data

SMS/MMS

Location Data

Social Network Analysis

From transactions to interactions

Social Media Posts

Customer Segmenting

Page 3: Big social data analytics - social network analysis

Network Analysis (NA) &Social Network Analysis (SNA)

Page 4: Big social data analytics - social network analysis

Graph and Matrix Representation of Networks

Star

Circle

Chain

0 1 1 1 1 1 11 0 0 0 0 0 01 0 0 0 0 0 01 0 0 0 0 0 01 0 0 0 0 0 01 0 0 0 0 0 0

0 1 0 0 0 0 11 0 1 0 0 0 00 1 0 1 0 0 00 0 0 1 0 1 00 0 0 0 1 0 11 0 0 0 0 1 0

0 1 1 0 0 0 01 0 0 1 0 0 01 0 0 0 1 0 00 1 0 0 0 1 00 0 1 0 0 0 10 0 0 1 0 0 00 0 0 0 1 0 0

Matrix

Page 5: Big social data analytics - social network analysis

Directed and Undirected Networks

B

A

C

A B C

A 0 0 1

B 1 0 0

C 0 1 0

B

A

C

A B C

A 0 1 1

B 1 0 1

C 1 1 0

Page 6: Big social data analytics - social network analysis

Sociomatrix

Jim Bob Alex TomJim - 0 1 0Bob 1 - 1 1Alex 1 1 - 1Tom 0 1 1 -

Relationship: is friend of

Source: Hoffman 2000; Moreno 1953

“the mathematical study of psychological properties of populations, the experimental technique of and the results obtained by application of quantitative methods” (Moreno, 1953, pp. 15-16).

Page 7: Big social data analytics - social network analysis

Direct and Indirect Paths (Friends/Connections/etc.)

Page 8: Big social data analytics - social network analysis

Nodes and EdgesGephiNodeXL

Page 9: Big social data analytics - social network analysis

Anatomy of Networks

Page 10: Big social data analytics - social network analysis

Network Metrics: Prominence

Centrality Prestige

Prominence

DegreeCentrality

ClosenessCentrality

DegreePrestige

ProximityPrestige

BetweenessCentrality

InformationCentrality

Status or RankPrestige

Source: Wasserman & Faust 1994

Page 11: Big social data analytics - social network analysis

• Degree• How many direct links a node has to other nodes

• In the case of a directed network it is possible to calculate both indegree (incoming connections) and outdegree (outgoing connections)

11

Degree Centrality

Source: Wasserman & Faust 1994

Page 12: Big social data analytics - social network analysis

• Closeness is the sum of shortest paths of a node to other nodes in the network

• dij length of shortest path between i and j

• Closeness centrality indicates how quickly a node can interact with other nodes

å=

=n

ijiji dc

Closeness Centrality

Source: Wasserman & Faust 1994

Page 13: Big social data analytics - social network analysis

• Betweennes measures the degree to which a node is located at the shortest paths between two nodes

• Betweennes centrality indicates the ability of node to control information between other nodes (gatekeeper)

• A node may not be locally central, but may still have a high betweenness centrality

13

Betweenness Centrality

Source: Wasserman & Faust 1994

Page 14: Big social data analytics - social network analysis

Network Analysis Process in Practice

• Network Analysis process usually consists of the following four phases:1. Interpreting the phenomena under

investigation as a network2. Collecting data3. Cleaning and refining the data4. Network layout and fine-tuning

Source: Huhtamäki & Parviainen 2015

Page 15: Big social data analytics - social network analysis

A process for visualization

Source: Card et al. 1999

Page 16: Big social data analytics - social network analysis

Visualization Stages

Visual and Cognitive

Processing

Physical Environment

Social Environment

Data gathering

DataPreprocessing

and transformation

Visualization Tool

Data manipulation

Data exploration

Source: Ware 2004

Page 17: Big social data analytics - social network analysis

OSTINATO Process Model for Visual Network Analysis

Source: Huhtamäki 2016

Page 18: Big social data analytics - social network analysis

Entity Recognition?

• Twitter provides natural identifiers for nodes (however some nodes maybe fake accounts or bots)

• In some other application areas, such as, bibliographic data analysis entity recognition is more problematic

• Entity Recognition can be done in network visualization tools (e.g. Gephi Data Laboratory) or using third-party applications (e.g. Open Refine)

Page 19: Big social data analytics - social network analysis

Entity Recognition in Gephi Data Laboratory

22.5.2017 19

Source Target

Page 20: Big social data analytics - social network analysis

Node and Edge Creation

DiGraph – Directed graphs with self loops

Each user mention creates an edge between users. For Twitter Mentions see:https://support.twitter.com/articles/14023#

Page 21: Big social data analytics - social network analysis

Visual Properties Configuration

Node Partition by Modularity Class

Page 22: Big social data analytics - social network analysis

Layout Processing: Force-driven layout• Layout refers to the act of placing the nodes on

canvas• Force-driven layout is a straightforward option:

– Nodes repel each other– Connections act as springs pulling the nodes back

together– The center of a gravitational field is placed in the

middle of the canvas– The process is run and configured in iteration until the

visualizer is happy with the result

Source: Huhtamäki 2015

Page 23: Big social data analytics - social network analysis

Example

Source: Huhtamäki et al. 2012

The list of startups participating in the Tekes YIC program was scraped from Tekes homepage.

The IEN Dataset was used to gather data on companies,investors, key individuals, and acquisitions.

Moreover, the Twitter usernames of the YIC companies were compiled in a spreadsheet in a semi-manual manner, and a tailored script was implemented to crawl Twitter REST API to collect the list of followers of each YIC company with a Twitter account.

Page 24: Big social data analytics - social network analysis

Interactive Network Visualization

Source: Aramo-Immonen et al. 2016; Aramo-Immonen et al. 2015

http://www.tut.fi/novi/case/2015-cbh-cmadfi2014-informallearning/twomode/network/

Page 25: Big social data analytics - social network analysis

Hashtag Co-Occurrence Matrix

http://www.tut.fi/novi/case/2015-cbh-cmadfi2014-informallearning/hashtags/matrix/

Source: Aramo-Immonen et al. 2016; Aramo-Immonen et al. 2015

Page 26: Big social data analytics - social network analysis

Extraction of Twitter data and Network Visualization

with Gephi

Page 27: Big social data analytics - social network analysis

Steps

• Collect the Twitter data– Download the following script for extracting tweets:

https://github.com/jjussila/BigSocialDataAnalytics/blob/master/scripts/search_trump.py

– Create a Twitter account or borrow from friend, if you do not already have one

– Create a Twitter App https://apps.twitter.com/– Create keychain.json file (that includes necessary keys and

tokes for accessing the data)• Start running Python code online

– https://www.pythonanywhere.com/• Install the following software

– Gephi https://gephi.org/ (for network visualization)

Page 28: Big social data analytics - social network analysis

Original Twitter-api script

Source: https://github.com/jukkahuhtamaki/pcm-demo/tree/master/twitter-api

Page 29: Big social data analytics - social network analysis

Modified script of extracting Twitter đata

Source: https://github.com/jjussila/BigSocialDataAnalytics

Page 30: Big social data analytics - social network analysis

Become a Twitter Developer

Page 31: Big social data analytics - social network analysis

Create your first Twitter App

Page 32: Big social data analytics - social network analysis

Get the keys and tokens needed to access Twitter data

Page 33: Big social data analytics - social network analysis

Create keychain.json using template file

Copy-paste from Twitter App the necessary keys and tokensand save the file as keychain.json

Page 34: Big social data analytics - social network analysis

Example of extracting tweet data

Page 35: Big social data analytics - social network analysis

Modifying the script

Note:

%40 = ‘@’

%23 = ‘#’’

For more details see:w3schools.comASCII Encoding Reference

Page 36: Big social data analytics - social network analysis

Network creation with NetworkX library

Source: NetworkX

Page 37: Big social data analytics - social network analysis

Using PythonAnywhere

Upload the following files:- search_twitter.py- keychain.json

Page 38: Big social data analytics - social network analysis

Running Python code on PythonAnywhere

Start a new console:

Bash

Page 39: Big social data analytics - social network analysis

Execute Python script in Bash console

22.5.2017 39

Page 40: Big social data analytics - social network analysis

Using PythonAnywhere

Download the following files:- network.gexf

Page 41: Big social data analytics - social network analysis

Open gexf (Graph Exchange XML Format) with Gephi

Page 42: Big social data analytics - social network analysis

Calculate the Network Metrics and Visualize the Network

Modularity Report(Community Detection Algorithm)

Page 43: Big social data analytics - social network analysis

References• Aramo-Immonen, H., Kärkkäinen, H., Jussila, J. J., Joel-Edgar, S., & Huhtamäki, J. (2016).

Visualizing informal learning behavior from conference participants' Twitter data with the Ostinato Model. Computers in Human Behavior, 55, 584-595.

• Aramo-Immonen, H., Jussila, J., & Huhtamäki, J. (2015). Exploring co-learning behavior of conference participants with visual network analysis of Twitter data. Computers in Human Behavior, 51, 1154-1162.

• Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: an open source software for exploring and manipulating networks. ICWSM, 8, 361-362.

• Card, S. K., Mackinlay, J. D., & Shneiderman, B. (1999). Readings in information visualization: using vision to think. Morgan Kaufmann.

• Huhtamäki, J. (2016). Ostinato Process Model for Visual Network Analytics: Experiments in Innovation Ecosystems. (Tampere University of Technology. Publication; Vol. 1425). Tampere University of Technology.

• Huhtamäki, J., Still, K., Isomursu, M., Russell, M., & Rubens, N. (2012, September). Networks of Growth: The Case of Young Innovative Companies in Finland. In Proceedings of the 7th European Conference on Innovation and Entrepreneurship: ECIE (p. 307). Academic Conferences Limited.

• Huhtamäki, J., & Parviainen, O. (2013). Verkostoanalyysi sosiaalisen median tutkimuksessa. Otteita verkosta-Verkon ja sosiaalisen median tutkimusmenetelmät. Vastapaino, Tampere.

• Jacomy, M., Venturini, T., Heymann, S., & Bastian, M. (2014). ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PloS one, 9(6), e98679.

• McSweeney, P. J. (2009). Gephi Network Statistics. Presentado en Google Summer of Code. Recuperado a partir de http://gephi. org/google-soc/gephi-netalgo. pdf.

• Ware, C. (2013). Information visualization: perception for design (Third ed.): Elsevier.• Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications (Vol. 8).

Cambridge university press.