42
Social Network Analysis Approach and Applications Joshua S. White PhD Candidate, Engineering Science April 22, 2014 Committee Members: Jeanna N. Matthews, PhD (Advisor) John S. Bay, PhD (External Examiner) Chris Lynch, PhD Chen Liu, PhD Stephanie C. Schuckers, PhD | Clarkson University 1/42

Social Network Analysis Applications and Approach

  • View
    227

  • Download
    2

Embed Size (px)

DESCRIPTION

Dissertation defense entitled: Social Network Analysis Applications and Approach

Citation preview

Page 1: Social Network Analysis Applications and Approach

Social Network AnalysisApproach and Applications

Joshua S. WhitePhD Candidate, Engineering Science

April 22, 2014

Committee Members:Jeanna N. Matthews, PhD (Advisor)John S. Bay, PhD (External Examiner)Chris Lynch, PhDChen Liu, PhDStephanie C. Schuckers, PhD

| Clarkson University 1/42

Page 2: Social Network Analysis Applications and Approach

OutlineMotivation . . . . . . . . . . . . . . . . 3

Problem Questions . . . . . . . . . 4Method & Publications . . . . . . . . . 5Coalmine . . . . . . . . . . . . . . . . . 6PySNAP . . . . . . . . . . . . . . . . . 7Established Dataset . . . . . . . . . . . 8

Insights into the Data . . . . . . . 9Botnet Command & Control Detection . 10Phishing Website Detection . . . . . . . 12

Phishing Website Detection Con-tinuum: ML based detection 14

Malware Infection Vector Detection . . 15Actor Identification . . . . . . . . . . . 19Event Identification . . . . . . . . . . . 24Conclusions . . . . . . . . . . . . . . . 30Future Work . . . . . . . . . . . . . . . 31Acknowledgements . . . . . . . . . . . 32References . . . . . . . . . . . . . . . . 33Contact . . . . . . . . . . . . . . . . . 34Questions . . . . . . . . . . . . . . . . 35Suplimental Material . . . . . . . . . . 36

| Clarkson University 2/42

Page 3: Social Network Analysis Applications and Approach

MotivationPartially inspired by Gladwell’s book, The Tipping Point [1], in which he discusseshow life can be thought of as an epidemic. Some criticism exists as to Gladwell’srigor, however for our use it is about inspiration and motivation not accuracy.

The Books Key Points “for our purposes”• Actors (Connectors, Mavens, Salesmen).

• Information spreads like disease.

• Ideas reach a tipping point (critical mass).

Let’s Face It - Social Networks Are Fun• We are a social species, that enjoy communicating and self adulation.

| Clarkson University 3/42

Page 4: Social Network Analysis Applications and Approach

Problem Questions

• Can we come up with a way of classifying users based on actor types?

• Can we determine who the opinion leaders or influencers are?

• Can we determine how information spreads on these networks?

• Can we detect malicious social network use?

• Are there information security applications for social network data-mining?

| Clarkson University 4/42

Page 5: Social Network Analysis Applications and Approach

Method & Publications• Establish a reliable collection mechanism.

• Establish a large dataset that can be utilized to answer each question.

• Use a case study approach, whereby each case feeds the next.

• Produce each case study as an individual publication or presentation.

– 3 x Published Proceedings

– 2 x Pending Proceedings

– 3 x Invited Presentations

| Clarkson University 5/42

Page 6: Social Network Analysis Applications and Approach

Coalmine• Scales well based on initial tests

• Useful for both manual and automated detection

• Allowed us to refine our data collection capabilities

At the Time (Future Work)

• Rebuild of the tool to fix scaling limitations

• More extensible Map/Reduce method

• Inclusion of native multi-threading capability

• New storage and distribution method

• New algorithms for automated opinion leader detection

| Clarkson University 6/42

Page 7: Social Network Analysis Applications and Approach

PySNAP

• Fixes all of the previous issues with Coalmine

• Completely reimplimented in Python with a few supportive Bash scripts

• Utilizes the DISCO MapReduce framework, also built on Python

• Included a better method for data capture that was previously bolted on to Coalmine

• Allowed us to establish a large dataset for future work

| Clarkson University 7/42

Page 8: Social Network Analysis Applications and Approach

Established Dataset• Over the course of 2012 we collected 165 TB of Twitter Data (Uncompressed)

– 175 Days Collected, 147 Full Days

∗ Estimated 45 Billion Tweets

– Recently released estimates place total Twitter traffic at 175 million tweets perday in 2012

– Thus our daily collection rates varied between 50% and 80% of total Twittertraffic.

– We captured complete tweet data in JSON format using Twitters REST API.

∗ This data includes a large number of additional fields other than the mes-sage text, all of which can be taken into account when doing measure-ments.

| Clarkson University 8/42

Page 9: Social Network Analysis Applications and Approach

Insights into the Data

| Clarkson University 9/42

Page 10: Social Network Analysis Applications and Approach

Botnet Command & Control Detection

• Joshua S White, Jeanna N Matthews, and John L Stacy. Coalmine: an experience in building a system for socialmedia analytics. In SPIE Defense, Security, and Sensing, pages 84080A-84080A. International Society for Opticsand Photonics, 2012.

| Clarkson University 10/42

Page 11: Social Network Analysis Applications and Approach

Botnet Command & Control Detection ContinuedDate/Time UID Text MSG Entropy Source

Sun Mar 20 15:27:02+0000 2011

49492150668365824

Shutdown -r now 3.37355726227518

http://twitter.com/Ebastos

Sun Mar 20 01:25:20+0000 2011

49280326475853825

# shutdown -h now 3.37355726227518

http://twitter.com/ohdediku

Sun Mar 20 21:40:53+0000 2011

49586229964062720

$ sudo shutdown -hnow

3.37355726227518

http://twitter.com/souzabruno

Sun Mar 20 19:38:41+0000 2011

49555476769280000

Text: sudo shut-down -h now

3.37355726227518

http://twitter.com/stormyblack

Sun Mar 20 18:51:51+0000 2011

49543693820116992

shutdown -now 3.37355726227518

http://twitter.com/godzilla2k9

Sun Mar 20 18:52:30+0000 2011

49543856840126464

shutdown -h now !: 3.37355726227518

http://twitter.com/ph3nagen

Sun Mar 20 18:52:30+0000 2011

49600582113177600

shutdown -H now. 3.37355726227518

http://twitter.com/willybistuer

Sun Mar 20 22:37:54+0000 2011

49597117039251457

elmenda: su shut-down -h now

3.37355726227518

http://twitter.com/NeoVasili

| Clarkson University 11/42

Page 12: Social Network Analysis Applications and Approach

Phishing Website Detection

• Joshua S White, Jeanna N Matthews, and John L Stacy. A method for the automated detection phishing websitesthrough both site characteristics and image analysis. In SPIE Defense, Security, and Sensing, pages 84080B- 84080B.International Society for Optics and Photonics, 2012.

| Clarkson University 12/42

Page 13: Social Network Analysis Applications and Approach

Phishing Website Detection Continued(F)raud / (L)egit URL Structural

FingerprintPage Title pHash Value Hamming Score

Paypal Fraudulent http://si4r.com/_paypal.co.uk/webscr.html?cmd=SignIn&co_partnerId=2&pUserId=&siteid=0&pageType=&pa1=&i1=&bshowgif=&UsingSSL=&ru=&pp=&pa2=&errmsg=&runame=

0,7,1,0,2 RETURNEDNOTHING

16716169687489800000

1

Paypal Legitimate https://www.paypal.com/cgi-bin/webscr?cmd=_login-submit&dispatch=5885d80a13c0db1f8e263663d3faee8d1e83f46a36995b3856cef1e18897ad75

27,3,0,0,2 Redirecting- Paypal

18439707190431800000

0

| Clarkson University 13/42

Page 14: Social Network Analysis Applications and Approach

Phishing Website Detection Continuum: ML baseddetection

• Title: An Image-based Feature Extraction Approach for Phishing Website Detection

• Authors: Hao Jiang, Joshua White, Jeanna Matthews

• Builds off of our previous work in phishing website detection, specifically the imageanalysis approach

• Utilizes a Machine Learning based approach to identifying the most prominent imageson a webpage, usually the sites logo

• Is able to detect phishing sites that the phash/hamming distance method concludes asnot similar.

– These are the “poor quality” phishing sites

| Clarkson University 14/42

Page 15: Social Network Analysis Applications and Approach

Malware Infection Vector Detection

• BEK (The Blackhole Exploit Kit) was the predominant MaaS (Malware as a Service)in 2012.

• It accounted for an estimated 29% of all malicious URLs.

• BEK licenses went for around 1500$ USD

• BEK used Twitter as it’s primary means of spreading infectious URLs

• Our method detects these malicious URLs and infectious accounts on a large scale

| Clarkson University 15/42

Page 16: Social Network Analysis Applications and Approach

Malware Infection Vector Detection Continued

• Joshua S. White and Jeanna N. Matthews, “It’s you on photo?: Automatic detection of Twitter accounts in-fected with the Blackhole Exploit Kit,” Malicious and Unwanted Software: "The Americas" (MALWARE), 2013 8thInternational Conference on , vol., no., pp.51,58, 22-24 Oct. 2013 doi: 10.1109/MALWARE.2013.6703685

| Clarkson University 16/42

Page 17: Social Network Analysis Applications and Approach

Malware Infection Vector Detection Continued

| Clarkson University 17/42

Page 18: Social Network Analysis Applications and Approach

Malware Infection Vector Detection Continued

| Clarkson University 18/42

Page 19: Social Network Analysis Applications and Approach

Actor Identification• Title: Connectors, Mavens, Salesmen and More: Actor Based Online Social Network

(OSN) Analysis Method Using Tensed Predicate Logic

• Authors: Joshua White and Jeanna Matthews

• Submitted to KDD2014 (Knowledge Discovery and Data Mining) Conference “DataMining for Social Good”

• Utilized multiple definitions of actor types to created tensed predicate logic descriptions

• Translated these logics into semantic queries

• Tested the queries against a known dataset

| Clarkson University 19/42

Page 20: Social Network Analysis Applications and Approach

Actor Identification Continued

| Clarkson University 20/42

Page 21: Social Network Analysis Applications and Approach

Actor Identification Continued• Time is important

• Previous methods did not take event sequence into account

• Liaison Example:

| Clarkson University 21/42

Page 22: Social Network Analysis Applications and Approach

Actor Identification Continued

| Clarkson University 22/42

Page 23: Social Network Analysis Applications and Approach

Actor Identification Continued

| Clarkson University 23/42

Page 24: Social Network Analysis Applications and Approach

Event Identification• Still in the initial stages of this part of our work

• Given a general topic, “search term, hashtag,” we can identify most of the relatedcontent from the dataset

• We have a means for alerting on all new posts regarding that term

• We can dig historically through the data and trace the path that an itea took

• We can identify the influential individuals, “accounts,” that played a part in the infor-mation spread

• Our test case was the KONY2012 Event

| Clarkson University 24/42

Page 25: Social Network Analysis Applications and Approach

Event Identification Continued

| Clarkson University 25/42

Page 26: Social Network Analysis Applications and Approach

Event Identification Continued• Top 10 Twitter Accounts, sending and receiving KONY2012 related Tweets

Directed @ Account Names In-Degree Origin Account Names Out-Degreetothekidswho 625 twittonpeace 47Invisible 125 interhabernet 44youtube 118 DailyisOut 44helpspreadthis 95 MEDYA_TURK 42justinbieber 83 haber_42 35prettypinkprobz 48 gundem_haber 30ninadobrev 48 twittofpeace 22MeekMill 47 korkmazhaber 19ladygaga 43 tarafsiz_haber 14KendallJenner 39 Son_DakikaHaber 13

| Clarkson University 26/42

Page 27: Social Network Analysis Applications and Approach

Event Identification Continued• Top 10 Twitter Accounts, retweeting and being retweeted regarding KONY2012

Retweeting Accounts In-Degree Message Source Out-DegreeMedyaKonya 8 Stop____Kony 2642twittonpeace 8 tothekidswho 753haber_42 7 konyfamous2012 716gundem_haber 7 Kony2012Help 615korkmazhaber 7 stop______kony 353DailyisOut 7 WESTOPKONY 225interhabernet 6 zaynmalik 221KONYA_ZAMAN 6 iSayStopKony 127konya_time 6 Stop_2012_Kony 80konyagazetesi 5 Kony_Awareness 72

| Clarkson University 27/42

Page 28: Social Network Analysis Applications and Approach

Event Identification Continued

| Clarkson University 28/42

Page 29: Social Network Analysis Applications and Approach

Event Identification Continued

| Clarkson University 29/42

Page 30: Social Network Analysis Applications and Approach

Conclusions• We aimed to answer the following questions when we started this work:

– Can we come up with a way of classifying users based on actor types?

– Can we determine who the opinion leaders or influencers are?

– Can we determine how information spreads on these networks?

– Can we detect malicious social network use?

– Are there information security applications for social network data-mining?

• I think we did a good job at providing at least some cursory answers to these questions

| Clarkson University 30/42

Page 31: Social Network Analysis Applications and Approach

Future Work• We have applied for a data grant from Twitter

• We have, are in the process of, moving our entire dataset to the lab at Clarkson andbuilding up a new capture/analysis system

• I am planning on pursuing the semantic side of social network analysis

– Currently only one SNA semantic ontology exists and it’s on on paper.

– I am planning on rolling both the actor and event analysis into one approachwhich will be part of a new ontology

| Clarkson University 31/42

Page 32: Social Network Analysis Applications and Approach

Acknowledgements• I would like to thank:

– Dr. Matthews

– Dr. Bay

– Dr. Lynch

– Dr. Schuckers

– Dr. Liu

| Clarkson University 32/42

Page 33: Social Network Analysis Applications and Approach

References

[1] Gladwell, M. (2000). The tipping point. Boston: Little, Brown and Company

| Clarkson University 33/42

Page 34: Social Network Analysis Applications and Approach

Contact

[email protected]

| Clarkson University 34/42

Page 35: Social Network Analysis Applications and Approach

Questions

Questions?

Page 36: Social Network Analysis Applications and Approach

Suplimental Material

| Clarkson University 36/42

Page 37: Social Network Analysis Applications and Approach

• DDFS

| Clarkson University 37/42

Page 38: Social Network Analysis Applications and Approach

| Clarkson University 38/42

Page 39: Social Network Analysis Applications and Approach

• Twitter JSON Key Fields

profile_link_color Coordinates verifiedIn_reply_to_screen_name Geo time_zoneIn_reply_to_status_id text statuses_countIn_reply_to_status_id_str entities ContributorsIn_reply_to_user_id place protectedprofile_background_color contributors_enabled trunkatedprofile_background_title default_profile retweeteddefault_profile_image description id_translatorfollow_request_sent followers_count locationfriends_count geo_endabled favorites_countprofile_image_url_https listed_count followingprofile_background_image_url notifications retweet_countbackground_image_url_https name created_atprofile_image_url lang Favoritedsidebar_border_color use_background_image Id_strsidebar_fill_color screen_name Created_atprofile_text_color show_all_inline_media Idurl utc_offset

| Clarkson University 39/42

Page 40: Social Network Analysis Applications and Approach

• BEK Infectious Account Visualization

| Clarkson University 40/42

Page 41: Social Network Analysis Applications and Approach

• Tensed Predicate Logic Key

| Clarkson University 41/42

Page 42: Social Network Analysis Applications and Approach

• Coalmine User Interface

| Clarkson University 42/42