Upload
jari-jussila
View
357
Download
2
Embed Size (px)
Citation preview
Social Network Analysis
Inforte course on Big Social Data Analytics 2017Dr. Jari Jussila
Twitter: @jjussilaEmail: [email protected]
GitHub: https://github.com/jjussila/BigSocialDataAnalytics
WEB
MOBILE AND SOCIAL MEDIA
ERP
CRM
Purchase & Transaction Records
Offers and Quotations
Customer Engagements
A/B Testing
Dynamic Pricing
Search Engine Marketing and Optimization
Target Marketing
Images and Videos
Speech to Text
Sensor Data
Application Log Data
SMS/MMS
Location Data
Social Network Analysis
From transactions to interactions
Social Media Posts
Customer Segmenting
Network Analysis (NA) &Social Network Analysis (SNA)
Graph and Matrix Representation of Networks
Star
Circle
Chain
0 1 1 1 1 1 11 0 0 0 0 0 01 0 0 0 0 0 01 0 0 0 0 0 01 0 0 0 0 0 01 0 0 0 0 0 0
0 1 0 0 0 0 11 0 1 0 0 0 00 1 0 1 0 0 00 0 0 1 0 1 00 0 0 0 1 0 11 0 0 0 0 1 0
0 1 1 0 0 0 01 0 0 1 0 0 01 0 0 0 1 0 00 1 0 0 0 1 00 0 1 0 0 0 10 0 0 1 0 0 00 0 0 0 1 0 0
Matrix
Directed and Undirected Networks
B
A
C
A B C
A 0 0 1
B 1 0 0
C 0 1 0
B
A
C
A B C
A 0 1 1
B 1 0 1
C 1 1 0
Sociomatrix
Jim Bob Alex TomJim - 0 1 0Bob 1 - 1 1Alex 1 1 - 1Tom 0 1 1 -
Relationship: is friend of
Source: Hoffman 2000; Moreno 1953
“the mathematical study of psychological properties of populations, the experimental technique of and the results obtained by application of quantitative methods” (Moreno, 1953, pp. 15-16).
Direct and Indirect Paths (Friends/Connections/etc.)
Nodes and EdgesGephiNodeXL
Anatomy of Networks
Network Metrics: Prominence
Centrality Prestige
Prominence
DegreeCentrality
ClosenessCentrality
DegreePrestige
ProximityPrestige
BetweenessCentrality
InformationCentrality
Status or RankPrestige
Source: Wasserman & Faust 1994
• Degree• How many direct links a node has to other nodes
• In the case of a directed network it is possible to calculate both indegree (incoming connections) and outdegree (outgoing connections)
11
Degree Centrality
Source: Wasserman & Faust 1994
• Closeness is the sum of shortest paths of a node to other nodes in the network
• dij length of shortest path between i and j
• Closeness centrality indicates how quickly a node can interact with other nodes
å=
=n
ijiji dc
Closeness Centrality
Source: Wasserman & Faust 1994
• Betweennes measures the degree to which a node is located at the shortest paths between two nodes
• Betweennes centrality indicates the ability of node to control information between other nodes (gatekeeper)
• A node may not be locally central, but may still have a high betweenness centrality
13
Betweenness Centrality
Source: Wasserman & Faust 1994
Network Analysis Process in Practice
• Network Analysis process usually consists of the following four phases:1. Interpreting the phenomena under
investigation as a network2. Collecting data3. Cleaning and refining the data4. Network layout and fine-tuning
Source: Huhtamäki & Parviainen 2015
A process for visualization
Source: Card et al. 1999
Visualization Stages
Visual and Cognitive
Processing
Physical Environment
Social Environment
Data gathering
DataPreprocessing
and transformation
Visualization Tool
Data manipulation
Data exploration
Source: Ware 2004
OSTINATO Process Model for Visual Network Analysis
Source: Huhtamäki 2016
Entity Recognition?
• Twitter provides natural identifiers for nodes (however some nodes maybe fake accounts or bots)
• In some other application areas, such as, bibliographic data analysis entity recognition is more problematic
• Entity Recognition can be done in network visualization tools (e.g. Gephi Data Laboratory) or using third-party applications (e.g. Open Refine)
Entity Recognition in Gephi Data Laboratory
22.5.2017 19
Source Target
Node and Edge Creation
DiGraph – Directed graphs with self loops
Each user mention creates an edge between users. For Twitter Mentions see:https://support.twitter.com/articles/14023#
Visual Properties Configuration
Node Partition by Modularity Class
Layout Processing: Force-driven layout• Layout refers to the act of placing the nodes on
canvas• Force-driven layout is a straightforward option:
– Nodes repel each other– Connections act as springs pulling the nodes back
together– The center of a gravitational field is placed in the
middle of the canvas– The process is run and configured in iteration until the
visualizer is happy with the result
Source: Huhtamäki 2015
Example
Source: Huhtamäki et al. 2012
The list of startups participating in the Tekes YIC program was scraped from Tekes homepage.
The IEN Dataset was used to gather data on companies,investors, key individuals, and acquisitions.
Moreover, the Twitter usernames of the YIC companies were compiled in a spreadsheet in a semi-manual manner, and a tailored script was implemented to crawl Twitter REST API to collect the list of followers of each YIC company with a Twitter account.
Interactive Network Visualization
Source: Aramo-Immonen et al. 2016; Aramo-Immonen et al. 2015
http://www.tut.fi/novi/case/2015-cbh-cmadfi2014-informallearning/twomode/network/
Hashtag Co-Occurrence Matrix
http://www.tut.fi/novi/case/2015-cbh-cmadfi2014-informallearning/hashtags/matrix/
Source: Aramo-Immonen et al. 2016; Aramo-Immonen et al. 2015
Extraction of Twitter data and Network Visualization
with Gephi
Steps
• Collect the Twitter data– Download the following script for extracting tweets:
https://github.com/jjussila/BigSocialDataAnalytics/blob/master/scripts/search_trump.py
– Create a Twitter account or borrow from friend, if you do not already have one
– Create a Twitter App https://apps.twitter.com/– Create keychain.json file (that includes necessary keys and
tokes for accessing the data)• Start running Python code online
– https://www.pythonanywhere.com/• Install the following software
– Gephi https://gephi.org/ (for network visualization)
Original Twitter-api script
Source: https://github.com/jukkahuhtamaki/pcm-demo/tree/master/twitter-api
Modified script of extracting Twitter đata
Source: https://github.com/jjussila/BigSocialDataAnalytics
Become a Twitter Developer
Create your first Twitter App
Get the keys and tokens needed to access Twitter data
Create keychain.json using template file
Copy-paste from Twitter App the necessary keys and tokensand save the file as keychain.json
Example of extracting tweet data
Modifying the script
Note:
%40 = ‘@’
%23 = ‘#’’
For more details see:w3schools.comASCII Encoding Reference
Network creation with NetworkX library
Source: NetworkX
Using PythonAnywhere
Upload the following files:- search_twitter.py- keychain.json
Running Python code on PythonAnywhere
Start a new console:
Bash
Execute Python script in Bash console
22.5.2017 39
Using PythonAnywhere
Download the following files:- network.gexf
Open gexf (Graph Exchange XML Format) with Gephi
Calculate the Network Metrics and Visualize the Network
Modularity Report(Community Detection Algorithm)
References• Aramo-Immonen, H., Kärkkäinen, H., Jussila, J. J., Joel-Edgar, S., & Huhtamäki, J. (2016).
Visualizing informal learning behavior from conference participants' Twitter data with the Ostinato Model. Computers in Human Behavior, 55, 584-595.
• Aramo-Immonen, H., Jussila, J., & Huhtamäki, J. (2015). Exploring co-learning behavior of conference participants with visual network analysis of Twitter data. Computers in Human Behavior, 51, 1154-1162.
• Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: an open source software for exploring and manipulating networks. ICWSM, 8, 361-362.
• Card, S. K., Mackinlay, J. D., & Shneiderman, B. (1999). Readings in information visualization: using vision to think. Morgan Kaufmann.
• Huhtamäki, J. (2016). Ostinato Process Model for Visual Network Analytics: Experiments in Innovation Ecosystems. (Tampere University of Technology. Publication; Vol. 1425). Tampere University of Technology.
• Huhtamäki, J., Still, K., Isomursu, M., Russell, M., & Rubens, N. (2012, September). Networks of Growth: The Case of Young Innovative Companies in Finland. In Proceedings of the 7th European Conference on Innovation and Entrepreneurship: ECIE (p. 307). Academic Conferences Limited.
• Huhtamäki, J., & Parviainen, O. (2013). Verkostoanalyysi sosiaalisen median tutkimuksessa. Otteita verkosta-Verkon ja sosiaalisen median tutkimusmenetelmät. Vastapaino, Tampere.
• Jacomy, M., Venturini, T., Heymann, S., & Bastian, M. (2014). ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PloS one, 9(6), e98679.
• McSweeney, P. J. (2009). Gephi Network Statistics. Presentado en Google Summer of Code. Recuperado a partir de http://gephi. org/google-soc/gephi-netalgo. pdf.
• Ware, C. (2013). Information visualization: perception for design (Third ed.): Elsevier.• Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications (Vol. 8).
Cambridge university press.