Upload
learjk
View
360
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Shalin Hai-Jew Kansas State University 2014 National Extension Technology Conference
Citation preview
Hashtag Conversations,Eventgraphs,
and User Ego Neighborhoods: Extracting Social Network Data
from Twitter
Shalin Hai-JewKansas State University
2014 National Extension Technology Conference May 2014
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
2
Presentation Overview
• This introduces methods for extracting and analyzing social network data from Twitter for hashtag conversations (and emergent events), event graphs, search networks, and user ego neighborhoods (using NodeXL). There will be direct demonstrations and discussions of how to analyze social network graphs. This information may be extended with human- and / or machine-based sentiment analysis.
3
Self-Intros
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
• Do you use Twitter? If so, how? • Who do you follow on Twitter, and why?
• Have you analyzed your own social networks on Twitter? What’s the company you keep (online)?
• Have you ever created a hashtag for a formal conference event? Were you able to gain some insights about what your participants were experiencing during the conference?
• What would you like to learn in this session?
* My goal for you is to learn capability (what is fairly easily possible), not method… Method is for another day, another time.
4
Twitter Social Networking and Microblogging Social Media Platform
• 140-character text-based Tweets• Images (Twitpics) and videos (Vine)• Accounts as humans, ‘bots (collecting and re-tweeting information,
sensor networks), and cyborgs (humans and ‘bots co-Tweeting) • Created in 2006 and based out of San Francisco, California
• 500 million registered users in 2012 • 340 million Tweets a day as the “SMS of the Internet”
• Has attracted a range of public, private, and governmental organizations; groups (religious, political, advocacy, and others); individuals• Has an application programming interface (API) which enables some
limited access to their public data
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
5
Electronic Social Network Analysis
• Extraction of social network data from social media platforms (through their APIs): social networking sites, email systems, wikis, blogs, microblogging sites, web networks, and others • Node-link, vertex-edge, entity-relationship • A form of structure mining with implications for
• Organizational analysis• Entity (node) analysis • Social ties • Understandings of social structure and power • Diffusion of innovation, information, culture, attitudes, and other
transmissible resources • Electronic event analysis
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
7
Some Basics of E-SNA (cont.)
• Core-periphery dynamic and influence (and power) / “primary” and “secondary” membership in the network • Knowledge and influence • Collection of resources
• Clustering • Motif censuses, network structures, network topologies, geodesic
distance, connectivity • Bridging
• Network structure, network topology • Thick ties / tight coupling in electronic social spaces • Thin ties / loose coupling in electronic social spaces • Homophily vs. heterophily
• The company you keep
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
8
Some Basics of E-SNA (cont.)
Global Social Network Structures
• Betweenness centrality (shortest path betweenness centrality) • Closeness centrality (closeness of
a node to all other nodes in the network graph)• Eigenvector centrality (closeness
to important neighbors)• Clustering coefficient (the
amount of clustering in a network)
Local Social Network Structures
• Degree centrality (in-degree and out-degree) • Clustering coefficient
(embeddedness)
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
9
Units of Analysis
• Entity: Node or vertex • Relationships: Links, edges
• Dyads, triads, … motifs (different relational structures)
• Clusters and sub-clusters (groups or meta-nodes)• Islands • Pendants (one node, one link); whiskers (one link, multiple nodes) • Isolates • Ego neighborhoods • Social network • Multiple social networks • “Big data” universes
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
10
Why Learn about Electronic Social Networks?
• Understand respective roles in the community • Identify informally influential individuals who are otherwise hidden
• Monitor what messages are moving through the network to understand public sentiment and understandings • Plan diffusion of prosocial information and actions; head off negative
diffusions in a social network • Wire new networks for social and individual resilience (such as
regarding health, emotion, economics, and other) • Rewire social networks for different objectives and aims; optimize
social groups based on what is known about people’s socializing and preferences
11
E-SNA on Twitter….
• Hashtag conversations (#) • Event graphs (unfolding formal and informal events by hashtags and
key words) • Search networks • Understanding user (account) social networks
• Ego neighborhoods on Twitter (direct alters) • Clusters and sub-clusters; islands; pendants; isolates• Motif censuses • Egos
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
12
Questions so Far?
• What do you think about (electronic) social network analysis (and structure mining)? Do you think that the assumptions are valid? Why or why not?
• What do you think about electronic social network analysis?
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
13
Hashtag Conversations
• Narrow-casting (to a distinct small group) and broad-casting (communicating broadly to any who care to follow) • Identifying the messages shared
• Sentiments • Semantics • Main conversationalists • Calls to action
• Identifying the networks of accounts in connection to each other around this discussion• Observing the interactions between accounts (nodes or vertices)
around the particular discussion • Identifying the “mayor of your hashtag” (using Dr. Marc A. Smith’s
phrasing) or the influential discussants and their important (central, widely followed, re-tweeted) messaging
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
14
Eventgraphs
• Mapped networks of interactions based around a physical or virtual or other event (in this case) • Formal, informal, or semi-formal• Planned or unplanned events
• Conferences with disambiguated or original hashtags; may include online or augmented reality games to increase participation (planned)
• Accidents, mass health events, or unusual “spectacle” occurrences (unplanned) • Micro (local or distributed) or mass (locationally clustered or distributed)
• Trending microblogging messaging over time (exponential messaging to peaks or multiple peaks and gradual diminishment or steep drop-off)• Multimedial with microblogged text, images, and video; interactive;
dynamic • Identification of the main geographical locations of the discussants
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
15
Search (Social) Networks (Online)
• Identification of • particular topics in discussion (the less
ambiguity of the term, the better; otherwise, the tools will track a broad range of terms with various word senses) • discussants (social media platform
accounts) • main messaging of the discussants
(Tweet or microblogging streams) • main physical locations of the discussants
(based on noisy geo information)
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
16
User Social Networks
• Node / vertex / entity / agent analysis • Link / edge / arc / tie / relationship analysis • Identification of the alters in the ego neighborhood• Analysis of transitivity among the alters in the ego neighborhood• Capture of a 2-degree social network on Twitter
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
17
Motif Censuses
• Understanding of the global nature of the network • The power structures within the network • The clusters, sub-clusters, islands, pendants, and isolates
• The social individuals and entities within the network • The transmissibles moving through the network • Static (vs. dynamic information captures)
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
18
The Data Extraction and Network Visualization Tool: NodeXLNetwork Overview, Discovery and Exploration for Excel
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
19
Network Overview, Discovery and Exploration for Excel (NodeXL)
• NodeXL• Free and open-source code• Data scraping from social media
platforms through their respect APIs (of publicly available information only)• Add-on to Excel (formerly known as
NetMap)
• Available on the Microsoft CodePlex platform • Requires Windows (or parallels on Mac)
• Sponsored by the Social Media Research Foundation • NodeXL Graph Gallery for shared
graphs and datasets
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
20
Types of Data Extractions from Twitter
NodeXL (relations, structure, select contents)
• #hashtag • Search • Twitter “List Network”• Twitter User Network
NCapture of NVivo (semantics, message contents)
• Twitter User Tweets • Twitter List Tweets
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
21
Input Parameters
• Size of the crawl • Degree of the crawl • Image capture • Tweet capture • Direction (followed by/ following /
both) • Edge definition: Followed /
following; replies-to; mentions• Tweet column
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
22
Data Processing: Graph Metrics
• Degree, in-degree, out-degree• Betweenness and closeness
centralities• Eigenvector centrality • Vertex clustering coefficient • Vertex pagerank • Edge reciprocation • Words and word pairs • Twitter search network top items
• …and others
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
23
Data Processing: Grouping
• Group by vertex attribute • Group by connected component • Group by cluster• Group by motif
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
24
Data Visualization
• Type of layout algorithm applied to the data • Autofill
• Labeling of vertices• Labeling of edges
• Graph pane • Graph options • Zoom • Scale
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
25
Dynamic Filtering
• Adjust parameters (with the sliders) to limit what is visualized • Change up the time
zones to analyze what is being communicating and by whom at which time (UTC / coordinated universal time) • Capture broadly and
then focus in using dynamic filtering
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
26
Data Analysis
• Use both the dataset and the visualizations (they both complement each other and are necessary for full understanding) • Capture the Tweets column and import that into a text analysis
software program
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
27
Limits -> Controlling for Input Parameters for the Data Extraction
• Social media platform (Twitter and its data processing rate limits), even with an account for “whitelisting” (and the time-of-day of the data extraction through its data-streaming API) • NodeXL (up to about 300,000
records or so) • Computational power of
researcher machine • Computer memory of researcher
machine
• No early indicator of size of data crawl or the acquire-ability of the electronic social network • Costly (computational and
time expense) non-captures at system limits
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
28
Addendum
• May apply Boolean operators into the query (and query multiple terms simultaneously) • May use macros• May re-crawl using original parameters of a data extraction • May automate data extractions
29
Some Sample Graph VisualizationsFrom NodeXL Extractions from Twitter
Note: Other details have been excluded because these visualizations are incomplete without the graph metrics and other complementary data…and it would be misrepresentational to explain the contexts of the data crawl behind the social network graphs incompletely. All of these graphs may be found in fuller detail and some with downloadable data sets on the NodeXL Graph Gallery. At the graph gallery, put “SHJ” in the Search bar at the top right.
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
30
Grid
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
31
Circle Layout (Ring Lattice Graph)
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
32
Harel-Koren Fast Multiscale with Vertex Labels
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
33
Random Layout Algorithm, Images at the Vertices
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
34
Sugiyama Layout of Groups, Force-Based Overall Network Layout
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
35
Harel-Koren Fast Multiscale
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
36
Horizontal Sine Wave
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
37
Harel-Koren Fast Multiscale
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
38
Motif, Harel-Koren Fast Multiscale
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
39
Harel-Koren Fast Multiscale
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
40
Fruchterman-Reingold Layout, Partitioned
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
41
3D Fruchterman-Reingold Force-Based Graph
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
42
Circle Layout / Ring Lattice Graph at Group Level, Force-Based Layout at Network Level
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
43
Harel-Koren Fast Multiscale
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
44
Harel-Koren Fast Multiscale
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
45
Fruchterman-Reingold Layout, Imagery for Vertices
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
46
Random Layout of Groups, Force-Based Layout of Network with Combined Edges
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
47
Harel-Koren Fast Multiscale Layout at Cluster Level, Force-Based Layout at Network Level
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
48
Motifs Extraction (Census), Sugiyama Layout at Network Level
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
49
Harel-Koren Fast Multiscale for Groups, Force-Based Layout at Network Level
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
50
Clustering by Clauset-Newman-Moore, Network Layout with Harel-Koren Fast Multiscale
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
51
Motifs at Group Level, Spiral at Network Level
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
52
Random at Group Level, Packed Rectangles for Network
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
53
Harel-Koren Fast Multiscale for Clusters, Treemap Layout for Network
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
54
Horizontal Sine Wave Layout (on beta)
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
55
Harel-Koren Fast Multiscale
56
Sugiyama, Stacked Rectangles
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
57
Fruchterman-Reingold
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
58
Fruchterman-Reingold
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
59
Harel-Koren Fast Multiscale
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
60
Harel-Koren Fast Multiscale
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
61
Motif, Fruchterman-Reingold, on Grid
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
62
Grid, Imagery on Vertices
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
63
Multi-Sequence Mixed Visualization
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
64
And…
65
NodeXL Graph Server
• Continuous crawl based on a certain term or account for over a month • Academic purposes only • Must be requested through Dr. Marc A. Smith (Connected Action Consulting
Group @ [email protected])
• Not retroactive crawls (a limitation of Twitter)
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
66
NodeXL Beta Layouts
• Treemap• Packed rectangles• Force directed
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
67
Mixing Up Datasets
Twitter Data Grants
• Feb. 2014 • Twitter Engineering Blog
Other Sources
• Content-sharing sites (with public APIs)• YouTube• Flickr
• Social networking sites (with public APIs)• Facebook• LinkedIn
• Email Networks• Web networks • Wiki networks
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
68
Semantic (Meaning) Analysis of a Tweet Stream Using NCapture (add-in to Google Chrome and MS Internet Explorer browsers) and NVivo (a qualitative and mixed methods data analysis tool)
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
69
(Partial) Twitter Feed Capture using NCapture of NVivo 10
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
70
Word Cloud based on Word Frequency Count from Twitter Feed (Gist)
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
71
Geolocation (Lat / Long) Data of Active Twitter User Accounts on a Tweet Stream / Feed
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
72
Word Similarity Analysis
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
73
Word Frequency Treemap (classical content analysis)
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
74
Word Search Word Tree (and Stemming)
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
75
Manual Analysis…through Coding, Categorizing, and Evaluation
• Data reduction • Summary • Matrix analysis • Coding and analysis
Topic Pro (sentiment) Con (sentiment)
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
76
Human-Machine Analysis
• Network Text Analysis Theory (language modeled as networks of words and relations) • Semantic network
• Nodes: concepts or ideas, ideational kernels • Links: statements, relationships (strength of relationship, directionality such
as agreement / disagreement or positive / negative, type of relation, sentiment • Network: semantic map, union of all statements
• May be a one-mode network (all nodes of a type)• Concepts
• May be a multi-modal network (based on ontological coding with various mixes of node types)• Persons, places, concepts, sentiments, locations, and others
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
77
Human-Machine Analysis (cont.)
• Meta-network analysis based on a text corpus / merged text corpuses • Drawn from unstructured natural language text data • Identification of users (account holders on Twitter) and their
interrelationships with others based on messaging and re-Tweeting and following / not following
• May use Carnegie Mellon University’s freeware text-mining tool AutoMap 3.0.10.18 on Windows (by Center for Computational Analysis of Social and Organizational Systems, CASOS) (2001 – present) • Graph visualizations in 2D and 3D made in ORA-NetScenes (CASOS)
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
78
Human-Machine Analysis (cont.)
• AutoMap…requires data pre-processing (setting parameters) • Requires text corpuses as .txt files (transcoding from .doc, .docx, .HTML, or
other) • May combine multiple text sets (through merging); can then query on the
whole set or on the individual text sets • May create “stop words” (or “delete”) lists to de-noise data (with “stop
words” like relative pronouns, personal pronouns, articles, conjunctions, and other words with less semantic meaning, etc.) • May use universal or domain-specific “thesauruses” to define, filter, and
hone the meta-network extractions• Enables the defining of sentiment • Requires testing of a sample set and meta network visualization to ensure
appropriateness of the data refinements • Involves the design of meta-networks and ontologies from the text corpuses
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
79
Human-Machine Analysis (cont.)
• …requires data processing and data visualization • May run the textual data processing • Includes a web scraper to main social media platforms in its ScriptRunner
feature
• …requires data post-processing • Includes accessing AutoMap data from ORA-NetSense to create network
visualizations• Includes data “mining” for meaning / sense-making (identification of
patterns) • Includes data visualization analysis
• Note: The work may require re-running this cycle multiple times for different data queries.
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
80
Sampler: Wordle™ Word Cloud to Create an Emergent Thesaurus
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
81
Sampler: Excerpt from a Year’s Worth of a Blog’s Text Corpus
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
82
Sampler: @kstate_pres Tweets Visualization
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
83
Demos?
• Would you like to see how to set up a simple data crawl from Twitter using NodeXL? (Note: Twitter rate limiting may mean that a completed data extraction may not be achieved, but you can at least see what a basic setup may look like.)
• Any questions?
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
84
Conclusion and Contact
• Dr. Shalin Hai-Jew• Instructional Designer
• Information Technology Assistance Center• Kansas State University• 212 Hale Library• 785-532-5262• [email protected]
• Thanks to Dr. Marc A. Smith, sociologist and Chief Social Scientist for Connected Action, for generously presenting a webinar at K-State to our faculty and staff. Also, Tony Capone, NodeXL developer, made the NodeXL beta available to me and has been very gracious and encouraging.