Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Kathleen M. Carley
1
Center for Computational Analysis of Social and Organizational Systems
http://www.casos.cs.cmu.edu/
Tracking Change in the Cyber-Mediated World
Prof. Kathleen M. Carley
Big Data is Every Where
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 2
Kathleen M. Carley
2
Expected Value
• Crisis response– Timely & detailed that can save lives
• State stability– Increase resilience, increase monitoring, leaderless coordination
• Security– Early identification of threats, increase threat
• Health– Improved and customized medicine, self managed care
• Social Cyber– Increase information wars, increase detection,
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 3
Supporting Technologies… in Workflows
Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 4
• Social and Dynamic Network Analytics– Graph metrics & algorithms– Statistical metrics &
algorithms– Simulation
• Visual Analytics• Text Analytics• Machine Learning• Simulation Analysis and forecasting of who communicates,
influences or did / will do what to whom - when, how, and why
NetMapper
Construct
Kathleen M. Carley
3
Methodological Challenges
• “wide” data– Few cases - lots of variables– Tools assume – lots of cases few variables
• Sampled data– Available data may have 30% or more missing, and biases are not known– Metrics assume < 10% missing data, known biases
• Irrelevant data– Most data is noise not signal– Data cleaning is ad-hoc and warps results
• Geo-temporal data– Data is “big data” and it varies with time and location– Most tools metrics assume
• Little data but many time periods• Lots of data but one time period• Controls for either location or time, not both• Spatial and temporal analytics are very computationally costly
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 5
SAMPLED DATA, GEO-TEMPORAL
Tsunami Warning and Response
Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 6
Kathleen M. Carley
4
Less than 30 Minutes!
Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 7
Proposed Tsunami Warning and Response System
Pressure Sensor
Detection
AccousticModem
Jakarta BMKG
Shore Island Station
Padang EOC
Acoustic Network
))) (((
Alert Sensor Data
Verified Info
InaTewsSystem Alert - verbal
Raspberry PieGeneral
Twitter Population
PiTTSees
CASOSWeb
LeaderCell Phone
Follower Cell Phone
Local Info
BMKG Tweet Route Shelter Info
Population, Twitter, and official Tweet Info
Route Shelter Info
Verified Info
Local Info
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 8
Kathleen M. Carley
5
Tsunami Warning and Response Social Media System
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 9
Illustration of Where Tweets are Coming From
• 7 days• Overall• By day or hour - different
• Heatmap of number of tweets by area
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
Sund
ay_0
0Su
nday
_07
Sund
ay_1
4Su
nday
_21
Mon
day_
04M
onda
y_11
Mon
day_
18Tu
esda
y_01
Tues
day_
08Tu
esda
y_15
Tues
day_
22W
edne
sday
_05
Wed
nesd
ay_1
2W
edne
sday
_19
Thur
sday
_02
Thur
sday
_09
Thur
sday
_16
Thur
sday
_23
Frid
ay_0
6Fr
iday
_13
Frid
ay_2
0Sa
turd
ay_0
3Sa
turd
ay_1
0Sa
turd
ay_1
7
Tweet Count Weekday HourReport meanTweet Count Weekday HourReport minTweet Count Weekday HourReport max
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 10
Kathleen M. Carley
6
Who is Most Likely to Know Others/Influence and Respect
Size = Out-Degree
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 11
Key Users
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 12
Kathleen M. Carley
7
GEO-TEMPORAL DATA
Understanding State Stability
Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 13
BenghaziBBC2200: Attackers open fire.22:15: Assailants gain entry to complex main building engulfed in flames.22:45: Security staff try to retake come under heavy fire retreat.23.20: Attempt 2 to retakeSuccessFighting moves to the annex.Midnight: Fighting continues at annex – 2 die01:15: Mr Stevens arrives at hospital Dies from smoke inhalation.02:30: Security forces regain control of annex.
Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 14
Kathleen M. Carley
8
Tweets Per Hournote Egyptian attack is against a background of violent events, Libya and
Yemen are anomalous
• News and Twitter peak near the same time• Strong periodicity to the data • Dramatic variation by country• Most retweeted are news but they are small fraction of actors
Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 15
Overall Tweet NetworkNote there are a few sources that are picked up
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 16
Kathleen M. Carley
9
Benghazi Consulate
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 17
Trust & Source
Adaptation
Nationalities
Protest
Terrorist O
rgs
Ethnic Groups
Violence
War
Youth
Jordan 0 ‐1 0 2 1 0 0 2
Kuwait 2 2 0 0 ‐2 0 0 0
Saudi Arabia 2 0 0 2 ‐1 0 0 2
UAE 2 0 0 0 ‐2 0 0 2
Yemen ‐2 2 1 1 2 2 ‐1 1
Bahrain 0 0 1 0 ‐1 2 0 2
Egypt 1 0 1 1 ‐2 2 0 1
Iran 0 1 0 ‐2 1 0 ‐2 0
Iraq 0 0 0 ‐1 ‐2 0 ‐1 0
Lebanon 0 1 0 ‐1 1 2 2 0
Libya 1 1 0 0 0 0 ‐1 0
Syria 0 1 1 2 0 1 ‐2 0
Tunisia 1 1 2 0 ‐2 2 0 1
‐2 twitter and not news
‐1 more twitter than news
0 neither
1 more news than twitter
2 news and not twitter
• Low civil unrest• News & Twitter distinct• Ethnic groups more
discussed in twitter• High civil unrest
• More twitter on war, ethnics, terrorists
• Less reliance on newsSeptember 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 18
Kathleen M. Carley
10
Relation of Media Users to Revolutionary Activity
• Similar number high similarity users
• Low Civil unrest – personalized discussion– More users, more users
with high similarity in non-Arabic tweet, more “social interaction” through replies in small groups
• High Civil unrest – get the message out!– Lower similarity among
non-Arabic tweeters, less “social” interaction and more inciting through retweets
19
0
5000
10000
15000
20000
25000
30000
35000
Users HighlySimilar
AllTopics
HighlySimilar
inEnglishTopics
ReplyUsers
RetweetUsers
Low Civil Unrest High Civil Unrest
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU
Varying View of Terrorist and Association with Nationalities
News Twitter
High Instability
Low Instability
Kathleen M. Carley
11
WIDE DATACyber Attacks
Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 21
Problem and Approaches
• Small Signal, often spatially and temporally distributed• Ocean of data• Interesting “stuff” is in the middle
• Spectral Clustering– Combine different kinds of things such as network position and
use of certain words• Lasso
– Often used in machine learning – to identify the network of connections among variables
• Geo-temporal visualization– Movies!
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 22
Kathleen M. Carley
12
Web Site Threats Encountered
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 23
Exploits: Countries that act as wayports
Argentina NigerDem. Rep of CongoAngola
LatviaMoldovaGeorgiaCroatia
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 24
Kathleen M. Carley
13
Cyber Connection
• Former USSR mostly attack each other in Cyber space
• Strongest attacks are on Russia, and from Ukraine
• In social media Russia mainly talks to itself and Belarus
• Ukraine, Estonia, Lithuania., Latvia use social media to build NATO sympathy and share approach
• Dirty bombs may be associated with cyber attacks
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 25
News – Increase in Cyber Attacks
Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 26
Increase Decrease No Impact
Kathleen M. Carley
14
IRRELEVANT DATAExtremist Groups
Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 27
Who’s on Social Media?
Organizations Individuals
BOTS
Kathleen M. Carley
15
Following Graph
Social Media Feature Integration
Dimensional reduction through extraction of the k lead Eigenvectors of each graph
Mention Graph Hash Tag Graph
An x m
Fn x k
Mn x k
Hn x k
… … …Zn x (m+4k)
=
Data Representation: Feature Integration
Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 29
Detection/Classification
Iterative Classification
• Accounts with high following and mention degree are often falsely included in detected communities
• Celebrity, news media, political leader and corporate accounts must be systematically removed through classification
Case Study Work Flow Search Collect Extract Features
DETECT: Remove Official
Accounts
CLUSTER : Develop
Training Set
DETECT: Extremist
Community
CLUSTER : Identify Community Sub-
Structure Analyze / Intervene
Supervised Learning Task:
Unsupervised Learning Task:
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 30
Kathleen M. Carley
16
ISIS Social Media Network
Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 31
Reciprocal Mentions: TrustArabic tweeters have since
been suspended
Group’s activity driven by eventsJoining is through following Imams
Step 1. Botnet Promotion: • Core Firibi Bot tweet 100-200 times over a
30-100 day time period mentioning other core members, bot benefactors, and highly central members of the OEC
Step 2: by mentioning Firibi Benefactors, other Core Bots, and highly central members of the OEC, the botnet gains followers from within the OEC and is able to promote the benefactor accounts.
Example: Firibi Benefactor
Example: Core Firibi Bot
Example: Firibi Follower
App Sign Up, solicits donations for children of Syria
Manipulating Community Structure
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 32
Kathleen M. Carley
17
Estimates – 25 to 50% of tweeters not human
Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 33
Change of Number of Tweeters after Suspended Users are Removed
• After Oct2012 or 10% or more tweeters are suspended
• Algeria often has 20-25% tweeters suspended
• Change in political reality
– more violent states have fewer suspended
– Less violent states have more suspended
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 34
Kathleen M. Carley
18
Change of Closeness after Suspended Users are Removed
• Removing suspended tweeters has mixed impact
• In Qatar and Kuwait there is a .5 increase in closeness
• In Algeria closeness drops sometimes by .5
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 35
Impact on Network Metrics
• 25% nodes removed – chance you predicted top 3 correct drops to 60%
• 25% extra nodes (undetected bots) – chance you predicted top 3 correct drops to 80%
NODE REMOVE NODE ADD
September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 36
Kathleen M. Carley
19
Summary
• Data is everywhere• New methodologies are changing the questions that can
now be answered• Most new approaches are focused on low hanging fruit
– Counting things, trends • A deeper understanding of the data is needed to know
what it really represents – sampling• Move from geo-correlation to geo-interpretation• Move from temporal-correlation to temporal forecasting• Move from standard statistics to wide data techniques• Move from standard social network analysis to high
dimensional networksSeptember 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 37
Related Papers• Kathleen M. Carley, Momin Malik, Pater M. Landwehr, Jürgen Pfeffer, and Michael Kowalchuck, 2016, in press
“Crowd Sourcing Disaster Management: The Complex Nature of Twitter Usage in Padang Indonesia.” Safety Science.
• Peter M. Landwehr, Wei Wei, Michael Kowalchuck and Kathleen M. Carley, 2016 in press, “Using Tweets to Support Disaster Planning, Warning and Response.” Safety Science . DOI: 10.1016/j.ssci.2016.04.012
• Kathleen M. Carley, Wei Wei and Kenneth Joseph, Nov 2015, “High Dimensional Network Analytics: Mapping Topic Networks in Twitter Data During the Arab Spring” In Shuguan Cui, Alfred Hero, Zhi-Quan Luo and Jose Moura (eds) Big Data over Networks, Cambridge University Press.
• Will Frankenstein, Binxuan Huang, Kathleen M. Carley, 2016, “NATO Trident Juncture on Twitter: Public Discussion,” Carnegie-Mellon University, School for Computer Science, Institute for Software Research, Pittsburgh, Pennsylvania, Technical Report CMU-ISR-16-100.
• Kenneth Joseph, Peter M. Landwehr, Kathleen M. Carley, 2014, Two 1%s Don't make a whole: Comparing simultaneous samples from Twitter's Streaming API, In Proceedings 7th International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, SBP 2014; Washington, DC; United States; 1 April 2014 through 4 April 2014, Lecture notes in Computer Science, 8393: 75-83.
• Wei Wei, Kenneth Joseph, Huan Liu, and Kathleen M. Carley, 2015, “The Fragility of Twitter Social Networks Against Suspended Users.” In Proceedings of 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), August 25-28, 2015, Paris, FR.
• Kathleen M. Carley, Wei Wei and Kenneth Joseph, Nov 2015, “High Dimensional Network Analytics: Mapping Topic Networks in Twitter Data During the Arab Spring” In Shuguan Cui, Alfred Hero, Zhi-Quan Luo and Jose Moura (eds) Big Data over Networks, Cambridge University Press.
• Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen M. Carley, 2013, “Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose, In proceedings of International AAAI Conference on Weblogs and Social Media (ICWSM),” July 8-10, Boston, MA.
• Kathleen M. Carley, Jürgen Pfeffer, Fred Morstatter, and Huan Liu. 2014. Embassies Burning: Toward a Near Real Time Assessment of Social Media Using Geo-Temporal Dynamic Network Analytics, Social Network Analysis and Mining, 4:195.
Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 38