19
Kathleen M. Carley 1 Center for Computational Analysis of Social and Organizational Systems http://www.casos.cs.cmu.edu/ Tracking Change in the Cyber-Mediated World Prof. Kathleen M. Carley [email protected] Big Data is Every Where September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 2

Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

1

Center for Computational Analysis of Social and Organizational Systems

http://www.casos.cs.cmu.edu/

Tracking Change in the Cyber-Mediated World

Prof. Kathleen M. Carley

[email protected]

Big Data is Every Where

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 2

Page 2: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

2

Expected Value

• Crisis response– Timely & detailed that can save lives

• State stability– Increase resilience, increase monitoring, leaderless coordination

• Security– Early identification of threats, increase threat

• Health– Improved and customized medicine, self managed care

• Social Cyber– Increase information wars, increase detection,

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 3

Supporting Technologies… in Workflows

Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 4

• Social and Dynamic Network Analytics– Graph metrics & algorithms– Statistical metrics &

algorithms– Simulation

• Visual Analytics• Text Analytics• Machine Learning• Simulation Analysis and forecasting of who communicates,

influences or did / will do what to whom - when, how, and why

NetMapper

Construct

Page 3: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

3

Methodological Challenges

• “wide” data– Few cases - lots of variables– Tools assume – lots of cases few variables

• Sampled data– Available data may have 30% or more missing, and biases are not known– Metrics assume < 10% missing data, known biases

• Irrelevant data– Most data is noise not signal– Data cleaning is ad-hoc and warps results

• Geo-temporal data– Data is “big data” and it varies with time and location– Most tools metrics assume

• Little data but many time periods• Lots of data but one time period• Controls for either location or time, not both• Spatial and temporal analytics are very computationally costly

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 5

SAMPLED DATA, GEO-TEMPORAL

Tsunami Warning and Response

Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 6

Page 4: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

4

Less than 30 Minutes!

Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 7

Proposed Tsunami Warning and Response System

Pressure Sensor

Detection

AccousticModem

Jakarta BMKG

Shore Island Station

Padang EOC

Acoustic Network

))) (((

Alert Sensor Data

Verified Info

InaTewsSystem Alert - verbal

Raspberry PieGeneral

Twitter Population

PiTTSees

CASOSWeb

LeaderCell Phone

Follower Cell Phone

Local Info

BMKG Tweet Route Shelter Info

Population, Twitter, and official Tweet Info

Route Shelter Info

Verified Info

Local Info

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 8

Page 5: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

5

Tsunami Warning and Response Social Media System

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 9

Illustration of Where Tweets are Coming From

• 7 days• Overall• By day or hour - different

• Heatmap of number of tweets by area

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

Sund

ay_0

0Su

nday

_07

Sund

ay_1

4Su

nday

_21

Mon

day_

04M

onda

y_11

Mon

day_

18Tu

esda

y_01

Tues

day_

08Tu

esda

y_15

Tues

day_

22W

edne

sday

_05

Wed

nesd

ay_1

2W

edne

sday

_19

Thur

sday

_02

Thur

sday

_09

Thur

sday

_16

Thur

sday

_23

Frid

ay_0

6Fr

iday

_13

Frid

ay_2

0Sa

turd

ay_0

3Sa

turd

ay_1

0Sa

turd

ay_1

7

Tweet Count Weekday HourReport meanTweet Count Weekday HourReport minTweet Count Weekday HourReport max

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 10

Page 6: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

6

Who is Most Likely to Know Others/Influence and Respect

Size = Out-Degree

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 11

Key Users

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 12

Page 7: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

7

GEO-TEMPORAL DATA

Understanding State Stability

Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 13

BenghaziBBC2200: Attackers open fire.22:15: Assailants gain entry to complex main building engulfed in flames.22:45: Security staff try to retake come under heavy fire retreat.23.20: Attempt 2 to retakeSuccessFighting moves to the annex.Midnight: Fighting continues at annex – 2 die01:15: Mr Stevens arrives at hospital Dies from smoke inhalation.02:30: Security forces regain control of annex.

Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 14

Page 8: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

8

Tweets Per Hournote Egyptian attack is against a background of violent events, Libya and

Yemen are anomalous

• News and Twitter peak near the same time• Strong periodicity to the data • Dramatic variation by country• Most retweeted are news but they are small fraction of actors

Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 15

Overall Tweet NetworkNote there are a few sources that are picked up

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 16

Page 9: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

9

Benghazi Consulate

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 17

Trust & Source

Adaptation

Nationalities

Protest

Terrorist O

rgs

Ethnic Groups

Violence

War

Youth

Jordan 0 ‐1 0 2 1 0 0 2

Kuwait 2 2 0 0 ‐2 0 0 0

Saudi Arabia 2 0 0 2 ‐1 0 0 2

UAE 2 0 0 0 ‐2 0 0 2

Yemen ‐2 2 1 1 2 2 ‐1 1

Bahrain 0 0 1 0 ‐1 2 0 2

Egypt 1 0 1 1 ‐2 2 0 1

Iran 0 1 0 ‐2 1 0 ‐2 0

Iraq 0 0 0 ‐1 ‐2 0 ‐1 0

Lebanon 0 1 0 ‐1 1 2 2 0

Libya 1 1 0 0 0 0 ‐1 0

Syria 0 1 1 2 0 1 ‐2 0

Tunisia 1 1 2 0 ‐2 2 0 1

‐2 twitter and not news

‐1 more twitter than news

0 neither

1 more news than twitter

2 news and not twitter

• Low civil unrest• News & Twitter distinct• Ethnic groups more

discussed in twitter• High civil unrest

• More twitter on war, ethnics, terrorists

• Less reliance on newsSeptember 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 18

Page 10: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

10

Relation of Media Users to Revolutionary Activity

• Similar number high similarity users

• Low Civil unrest – personalized discussion– More users, more users

with high similarity in non-Arabic tweet, more “social interaction” through replies in small groups

• High Civil unrest – get the message out!– Lower similarity among

non-Arabic tweeters, less “social” interaction and more inciting through retweets

19

0

5000

10000

15000

20000

25000

30000

35000

Users HighlySimilar

AllTopics

HighlySimilar

inEnglishTopics

ReplyUsers

RetweetUsers

Low Civil Unrest High Civil Unrest

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU

Varying View of Terrorist and Association with Nationalities

News Twitter

High Instability

Low Instability

Page 11: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

11

WIDE DATACyber Attacks

Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 21

Problem and Approaches

• Small Signal, often spatially and temporally distributed• Ocean of data• Interesting “stuff” is in the middle

• Spectral Clustering– Combine different kinds of things such as network position and

use of certain words• Lasso

– Often used in machine learning – to identify the network of connections among variables

• Geo-temporal visualization– Movies!

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 22

Page 12: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

12

Web Site Threats Encountered

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 23

Exploits: Countries that act as wayports

Argentina NigerDem. Rep of CongoAngola

LatviaMoldovaGeorgiaCroatia

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 24

Page 13: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

13

Cyber Connection

• Former USSR mostly attack each other in Cyber space

• Strongest attacks are on Russia, and from Ukraine

• In social media Russia mainly talks to itself and Belarus

• Ukraine, Estonia, Lithuania., Latvia use social media to build NATO sympathy and share approach

• Dirty bombs may be associated with cyber attacks

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 25

News – Increase in Cyber Attacks

Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 26

Increase Decrease No Impact

Page 14: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

14

IRRELEVANT DATAExtremist Groups

Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 27

Who’s on Social Media?

Organizations Individuals

BOTS

Page 15: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

15

Following Graph

Social Media Feature Integration

Dimensional reduction through extraction of the k lead Eigenvectors of each graph

Mention Graph Hash Tag Graph

An x m

Fn x k

Mn x k

Hn x k

… … …Zn x (m+4k)

=

Data Representation: Feature Integration

Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 29

Detection/Classification

Iterative Classification

• Accounts with high following and mention degree are often falsely included in detected communities

• Celebrity, news media, political leader and corporate accounts must be systematically removed through classification

Case Study Work Flow Search Collect Extract Features

DETECT: Remove Official

Accounts

CLUSTER : Develop

Training Set

DETECT: Extremist

Community

CLUSTER : Identify Community Sub-

Structure Analyze / Intervene

Supervised Learning Task:

Unsupervised Learning Task:

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 30

Page 16: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

16

ISIS Social Media Network

Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 31

Reciprocal Mentions: TrustArabic tweeters have since

been suspended

Group’s activity driven by eventsJoining is through following Imams

Step 1. Botnet Promotion: • Core Firibi Bot tweet 100-200 times over a

30-100 day time period mentioning other core members, bot benefactors, and highly central members of the OEC

Step 2: by mentioning Firibi Benefactors, other Core Bots, and highly central members of the OEC, the botnet gains followers from within the OEC and is able to promote the benefactor accounts.

Example: Firibi Benefactor

Example: Core Firibi Bot

Example: Firibi Follower

App Sign Up, solicits donations for children of Syria

Manipulating Community Structure

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 32

Page 17: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

17

Estimates – 25 to 50% of tweeters not human

Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 33

Change of Number of Tweeters after Suspended Users are Removed

• After Oct2012 or 10% or more tweeters are suspended

• Algeria often has 20-25% tweeters suspended

• Change in political reality

– more violent states have fewer suspended

– Less violent states have more suspended

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 34

Page 18: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

18

Change of Closeness after Suspended Users are Removed

• Removing suspended tweeters has mixed impact

• In Qatar and Kuwait there is a .5 increase in closeness

• In Algeria closeness drops sometimes by .5

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 35

Impact on Network Metrics

• 25% nodes removed – chance you predicted top 3 correct drops to 60%

• 25% extra nodes (undetected bots) – chance you predicted top 3 correct drops to 80%

NODE REMOVE NODE ADD

September 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 36

Page 19: Big Data is Every Where - University Of Illinoispublish.illinois.edu/frontiers-big-data-symposium/files/2016/10/CarleyUIUCslides.pdfTopic Networks in Twitter Data During the Arab Spring”

Kathleen M. Carley

19

Summary

• Data is everywhere• New methodologies are changing the questions that can

now be answered• Most new approaches are focused on low hanging fruit

– Counting things, trends • A deeper understanding of the data is needed to know

what it really represents – sampling• Move from geo-correlation to geo-interpretation• Move from temporal-correlation to temporal forecasting• Move from standard statistics to wide data techniques• Move from standard social network analysis to high

dimensional networksSeptember 2016 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 37

Related Papers• Kathleen M. Carley, Momin Malik, Pater M. Landwehr, Jürgen Pfeffer, and Michael Kowalchuck, 2016, in press

“Crowd Sourcing Disaster Management: The Complex Nature of Twitter Usage in Padang Indonesia.” Safety Science.

• Peter M. Landwehr, Wei Wei, Michael Kowalchuck and Kathleen M. Carley, 2016 in press, “Using Tweets to Support Disaster Planning, Warning and Response.” Safety Science . DOI: 10.1016/j.ssci.2016.04.012

• Kathleen M. Carley, Wei Wei and Kenneth Joseph, Nov 2015, “High Dimensional Network Analytics: Mapping Topic Networks in Twitter Data During the Arab Spring” In Shuguan Cui, Alfred Hero, Zhi-Quan Luo and Jose Moura (eds) Big Data over Networks, Cambridge University Press.

• Will Frankenstein, Binxuan Huang, Kathleen M. Carley, 2016, “NATO Trident Juncture on Twitter: Public Discussion,” Carnegie-Mellon University, School for Computer Science, Institute for Software Research, Pittsburgh, Pennsylvania, Technical Report CMU-ISR-16-100.

• Kenneth Joseph, Peter M. Landwehr, Kathleen M. Carley, 2014, Two 1%s Don't make a whole: Comparing simultaneous samples from Twitter's Streaming API, In Proceedings 7th International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, SBP 2014; Washington, DC; United States; 1 April 2014 through 4 April 2014, Lecture notes in Computer Science, 8393: 75-83.

• Wei Wei, Kenneth Joseph, Huan Liu, and Kathleen M. Carley, 2015, “The Fragility of Twitter Social Networks Against Suspended Users.” In Proceedings of 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), August 25-28, 2015, Paris, FR.

• Kathleen M. Carley, Wei Wei and Kenneth Joseph, Nov 2015, “High Dimensional Network Analytics: Mapping Topic Networks in Twitter Data During the Arab Spring” In Shuguan Cui, Alfred Hero, Zhi-Quan Luo and Jose Moura (eds) Big Data over Networks, Cambridge University Press.

• Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen M. Carley, 2013, “Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose, In proceedings of International AAAI Conference on Weblogs and Social Media (ICWSM),” July 8-10, Boston, MA.

• Kathleen M. Carley, Jürgen Pfeffer, Fred Morstatter, and Huan Liu. 2014. Embassies Burning: Toward a Near Real Time Assessment of Social Media Using Geo-Temporal Dynamic Network Analytics, Social Network Analysis and Mining, 4:195.

Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMUSeptember 2016 38