85
User Entity Behavior Analysis for Cyber Security 1 Dr. Chin-Hao, Eric, Mao ([email protected]) Institute for Information Industry 2016.09.13

User Entity Behavior Analysis for Cyber Security 2 - Track 2.1 - Mr. Eric Mao... · User Entity Behavior Analysis for Cyber Security 1 Dr. Chin-Hao ... University of Science and Technology,

Embed Size (px)

Citation preview

User Entity Behavior Analysis for Cyber Security

1

Dr. Chin-Hao, Eric, Mao ([email protected])Institute for Information Industry

2016.09.13

About me• Section Manager, Cyber Trust Technology Institute, Institute for Information Industry

• EDUCATIONS and EXPERIENCES• Ph. D., National Taiwan University of Science and Technology, Computer Science

• Visiting scholar in Carnegie Mellon University, CyLab • Presentation: FIRST 2013/2014, CSA-APAC 2013, APCERT2014, (ISC)2 APAC Congress 2016

• RESEARCH EXPERIENCES• big data analytics, network security, intrusion detection, mobile APP static analysis

• more than 20+ academic journal and conferences papers

2

Outline

3

Outline• What “Data Analytics in Information Security” do.• Supervised Learning

– Decision Tree– if2

• Unsupervised Learning- GMiner• Anomaly Detection & Graph Mining

– ChainSpot (Sequential)– PlayWright (Graph)

• Recommendation System & Data Visualization– CloudOrion

• Conclusions

4

What “Data Analytics in Information Security” do

• Log Analysis– IDS Logs– Firewall Logs– Web Server Access Logs– Proxy Logs– Active Directory Logs

• Social Media– Twitter, FB, …

5

• Host IDS (HIDS) vs. Network IDS (NIDS)

• Detect the known attacking signature– Port, attacking vector, or payload

– Highly depends on human fine tuning

– False alarm issue• Usually contains following information:

– Event Name, Source IP, Source Port, Dest. IP, Dest. Port,

TimeStamp

• A well-studied domain:

– False Alarm Reduction– Multi-steps Attacks correlation

• Applied scenario: SOC (security operation center), CERT,

ISAC...

Intrusion Detection System (IDS)

Firewall Logs

• Firewall: blocking connection or transmission based on known

blacklist or pre-defined policy

• Easy to deployment and use.

• Lacking of analysis ability as facing unseen attacking

signature. (e.g. IP, Domain Name)

• Contain following information:– Source IP, Source Port, Dest. IP, Dest. Port, Permit/Deny

• Also well-studied on large-scaled connection correlation

– Honeypot attacking pattern analysis

– Botnet structure analysis

Web Server Accessing Log

• Application-layer malicious behavior detection

• Web Server Accessing Logs contain:

– Source IP, Accessed Path, Access Status, Timestamp, Http

header (optional), File Size(optional)

• Usually, a tradeoff is between log visibility and server

execution performance

• Only application-layer malicious behavior such as script can be

detected. (entry point into system level threats)

• Research emerged around 2005:

– Analyze the accessed path to detect SQL injection

– Graph mining on association graph of web access

– Build a classifier to predict a risk of a given request

Users’ (visitor) Behavior!!

Proxy Logs

• Proxy is an agent deployed on gateway of an enterprise

network.

• Recorded the Web surfing behavior

• Why “web surfing behavior” is important

– URL connection for C&C server communication and control.

– Detect the phishing web site.

– Detect malicious scanning attack.

• Containing:

– Source IP, Dest. URL, TimeStamp, Web Agent Info., MIME

• Research emerged around 2010– Analyze the malware behavior inside the enterprise network.

– Correlate other logs (IDS, FW, AD logs, etc.) to evaluate risk of an

account

User’s Behavior!!

Active Directory Logs

• Operating system records connections or events, between client side and server side, of registered accounts.

• Containing:–TimeStamp, Account, Source IP, Event Name, Sub-

Event Annotation, …• Can be used to detect hidden malicious behavior:

–Anomaly login/logout behavior

–Anomaly resource allocation –Anomaly permission granting

• Recently, AD is well-used on event tracking, however, it resulted in issue that AD data is big scale–Companies in industry focused on AD-related research

in last 2-3 years.

User’s Behavior!!

Different Kinds of Machine Learning

Supervised Learning –Decision Trees

A Decision Tree Example – The Weather DataID code Outlook Temperature Humidity Windy Play

a

b

c

d

e

f

g

h

i

j

k

l

m

n

Sunny

Sunny

Overcast

Rainy

Rainy

Rainy

Overcast

Sunny

Sunny

Rainy

Sunny

Overcast

Overcast

Rainy

Hot

Hot

Hot

Mild

Cool

Cool

Cool

Mild

Cool

Mild

Mild

Mild

Hot

Mild

High

High

High

High

Normal

Normal

Normal

High

Normal

Normal

Normal

High

Normal

High

False

True

False

False

False

True

True

False

False

False

True

True

False

True

No

No

Yes

Yes

Yes

No

Yes

No

Yes

Yes

Yes

Yes

Yes

No

A Decision Tree ExampleOutlook

humidity windyyes

no yesyes no

sunny overcast rainy

high normal false true

Decision tree for the weather data.

Supervised Learning –Fuzzy Rule-based Classifier

iF2: An Interpretable Fuzzy Rule Filter for Web LogPost-Compromised Malicious Activity Monitoring

Chih-Hung Hsieh, Yu-Siang Shen, Chao-Wen Li, and Jain-Shing Wu

AsiaJCIS 2015

16

Motivation• Traditionally, log tracking depending

on heavy human effort • Common method for identifying APT

attacks become less useful because of various camouflages: VPN, Packing.

• That means you should have lots of time, domain experience, and good luck very much!

17

Interpretable Fuzzy Rule Filter (iF2)• In fuzzy logic, a fuzzy rule base consists of

multiple fuzzy rules with following structure

18

Exp. 3 – Using iF2 to Analyze Web Access Log• One real web access log dataset from a

certain organization in Taiwan was collected for usage of demonstrating the performance of iF2.

• Total 5.36 gigabytes and 45,669,917 records from 545 distinct internet addresses are analyzed and labeled.

• There are only 22 suspicious internet addresses and 523 benign ones, and it is a typically extremely unbalanced dataset.

19

The 11 Statistics Used as Features

20

Exp. – Resulted Rules• The Human interpretable rule base of iF2 to differentiate suspicious internet addresses of 545-IPs from benign ones.

21

22

Anomaly Detection Probability Model - ChainSpot

ChainSpot: Mining Service Logs for Cyber

Security Threat Detection

23

Motivation

• Target of APT is usually specific and personal

– Attackers usually invade enterprise and access sensitive data by using compromised accounts.

• We focus on detecting anomaly behavior or behavioral deviation for each employee (account) in an enterprise.

24

ChainSpot - Method

• We concern the anomaly behaviors extracted from Active

Directory (Kerberos or NTLM authentication) & Proxy

Server (web surfing usage) for each account.

• Sequential Data Synchronizer

– Correlate heterogeneous data (AD and Proxy) based on IP

Address and interval time (3 days) of SOC Tickets

25

Proxy

timeline

… …

SOC

Ticket i3 days

AD… …

ChainSpot - Method

• ChainSpot model sequential behaviors as a probabilistic model

as baseline, and any change on employee’s behaviors will results

in an anomaly which may imply:

– Account has been compromised

– APT exists in an employee’s host

• To properly model the sequential behaviors as a probabilistic

model for each account, and to detect the anomalies based on

deviation of account’s behaviors.

– Markov Model for building probabilistic model

– Graph Edit Distance for estimating behavioral deviation

26

ChainSpot Method – Markov Model

Transition Probability Matrix

27

An example of Markov Model

For each account, we will build his Markov model using his normal action sequences

Heterogeneous Data Correlation

• For each account, we build his personal profile of state sequences

which describe this account’s sequential behaviors.

• Each state in a sequential data consists of followings

– For AD log sequence

• Event (e.g., No 4624, 4634)

• Reply Code (Only 4771 -> 0x12, 0x18)

– For Proxy log sequence

• A Meta Behavior describing web surfing.

( e.g., GET and Download on microsoft.com then Failed )

28

� HTTP Method

� E,g., GET, POST, …

� Download or Upload� Download when size in > size out

� Upload when size in < size out

� Domain Name

� Second Level Domain

� Access Result� E.g., Allowed, Failed, …

Experiment regarding to real environment

• Environment & Dataset Description:

– Contains about 1,089 accounts.

– Duration from 2015/08/01 to 2015/08/31.

– The active directory domain service of Windows Servers ver. 2008-R2 results in

27,902,857 logs.

– The proxy service in the same duration generates 78,044,332 logs.

• Dataset

– Number of Categories in SOC Tickets: 6

29

Experiments (Cont’d)

• Evaluation

– We divide Aug. logs of each account into three partitions:

• Training data

– A half of unlabeled data mapped by SOC Tickets

• Abnormal Testing data

– Labeled data mapped by SOC Tickets

• Normal Testing data

– Selection from all unlabeled data except Training data

– Compare the difference of Training data between Anomaly and Normal

Testing data

– Experiment Hypothesis

30

Effectiveness of ChainSpot

31

Hypothesis:

The general effectiveness measuring gives 85.66% and 87.17% success rates

in terms of averaging on various types or averaging on different accounts.

How tight between Accounts (AD, Proxy) and IPs

32

33

TrainingNormal TestingAbnormal Testing

4658

4661

Abnormal Data has 4768 -> 4771 0x18, 4771 0x18 -> 4768

A Kerberos authentication ticket (TGT) was requested (4768)

Kerberos pre-authentication failed (Bad Password) (4771 0x18)

Abnormal Data has no 4661 -> 4662, 4662 -> 4658

A handle to an object was requested (4661)

An operation was performed on an object (4662)

The handle to an object was closed (4658)

There is no complete service path in abnormal data, and

which has error authentication by bad password

AD Case Study - d110382

Multiple Accounts login from single IP in short time

34

Normal Testing

Abnormal and Normal Data both have followings :Abnormal Data has no followings :

AD Case Study - administratorToo many logins in closing time

Training

An account failed

to log on(4625)

Special privileges assigned

to new logon(4672)

The domain controller attempted to

validate the credentials for an account

(4776)

An account failed

to log on(4625)

The handle to an object was closed (4658)

A handle to an object was requested (4661)

A new process has been created (4688)

A Kerberos authentication ticket (TGT) was requested (4768)

Kerberos pre-authentication failed (Bad PWD) (4771 0x18)A user account was locked out (4740)

Abnormal Testing

35

Proxy Case Study - b110513

Host connects to Malicious Domain

36

Anomaly Detection Graph Mining - Playwright

Playwright: a Web Server-side

Suspects Detector

3737

Current Framework of Playwright

38

The Anomaly Logs Filter Module

• Intention 1– Web log volume too large to inspect with bareeyes

• Input– Raw logs from web access record

• Output– Filtered logs

• Example– Illegal Characters Verify Module– Unsual Parameter Usage Verify

39

The Attack Scene Matching Module• Intention 2– There may not be only one candidate ofattacking Tactics, Techniques, and Processes (TTPs).

• Input– Kinds of filtered logs

• Output– Attacking Scene with Associated Filtered logs

40

The Graph Constructor Module• Intention 3– Concretize entities and associated relations of suspicious activities.

• Input– Each Attacking Scene withAssociated Filtered logs

• Output – Suspicious Activity Graph• Heterogeneous Graph• Bipartite Graph• Homogenous Graph 41

168.144.85.166

59.124.20.38

118.140.69.162

1.asp

2.asp

personal_taglist.asp

cisc_search.asp

Hacker maintains 3 IPshack.asp

The Graph Pattern Matching Module

• Intention 4– Find trigger points (or threat seeds) where attacker is in an attempt to invade target

• Input– Suspicious Activity Graph• Heterogeneous Graph• Bipartite Graph• Homogenous Graph

• Output– Entity labeled as suspect 42

• Case 1 : IIS Logs – System Compromise• Case 2 : Apache Logs – SQL Injection for

Data Leakage• Case 3 : Apache Logs – System

Compromise43

• Case 1 : IIS Logs – System Compromise• Case 2 : Apache Logs – SQL Injection for

Data Leakage• Case 3 : Apache Logs – System

Compromise44

Suspicious Activity Graph Construction - IIS

File Name IP Address Status Code Illegal Character Hacker’s Action (File)

Hacker’s Action (IP Address

• Scope of Attacking

– Each IP correlates multiple files with illegal character.

Scope of Attacking

45

Initial Threat Seed Finding:

with priori knowledge about the threat pattern• Our forensics team gives us priori knowledge to form the

following patterns:

Initial Seeds Finding Regarding to Known Pattern

47

• How to ranking nodes in a graph based on their referring to each other.

C

A B

•Originally be used to rank Google retrieved pages

Page Rank

Algorithm

Is IP which hacker used different from others in Suspicious Activity Graph ?! - IIS

• Problem:– Investigate the role of IP Address in Suspicious Activity Graph

• Input:– |IPs| x |IPs| Matrix• nij: num of Queried same File

– IPi to IPj• 0: Queried by different File

• Output:– Clustering Result IP X IP

IP1

IP2

.

.

.

IPN

IP1 IP2… IPN

n11 0 … n1N

n21 0 … 0

0 0 … nNN

48

49

0

0.02

0.04

0.06

0.08

0.1

0.12

140.1

12.2

4.1

46

140.9

2.1

8.6

8

140.9

2.1

43.1

22

140.9

2.4

7.1

89

140.9

2.4

6.1

03

101.1

2.1

.70

140.9

2.6

3.2

35

140.9

2.4

6.1

01

140.9

2.5

5.3

2

140.9

2.2

3.1

16

223.1

37.1

2.2

42

140.9

2.4

6.1

86

140.9

2.1

8.6

6

118.1

60.9

8.5

114.1

36.2

44.1

6

140.9

2.1

42.1

27

140.9

2.5

3.1

31

49.2

15.3

3.1

73

49.2

14.3

5.1

16

192.1

68.3

0.7

6

101.1

2.1

20.8

3

125.2

27.1

52.9

2

140.9

2.4

6.8

7

220.1

34.1

85.7

5

140.9

2.5

.26

111.2

51.1

8.5

6

140.9

2.1

43.5

140.9

2.2

2.2

28

140.9

2.3

.100

140.9

2.1

46.2

26

58.1

15.6

2.7

1

140.9

2.1

43.1

04

36.2

26.1

65.1

91

140.9

2.2

3.1

21

140.9

2.6

3.3

7

192.1

68.1

.222

114.1

36.1

82.5

8

27.2

47.6

8.7

3

140.9

2.3

0.3

7

140.9

2.6

2.1

03

195.1

54.1

84.1

26

149.2

02.8

9.1

87

140.9

2.1

41.1

2

140.9

2.1

8.1

39

140.9

2.1

42.5

3

140.9

2.6

3.3

4

140.9

2.6

3.1

29

140.9

2.6

3.1

19

140.9

2.7

1.6

1

140.9

2.6

3.1

87

140.9

2.1

43.2

4

140.9

2.6

2.1

16

140.9

2.3

9.1

9

188.1

65.2

07.1

40

140.9

2.7

7.1

28

140.9

2.1

41.2

18

220.1

30.1

96.2

18

140.9

2.3

0.5

3

140.9

2.1

8.1

99

140.9

2.6

2.1

99

207.4

6.1

3.4

8

140.9

2.6

2.1

59

192.1

68.4

0.1

05

157.5

5.3

9.1

35

140.9

2.3

8.2

157.5

5.3

9.2

44

118.1

63.1

50.2

36

157.5

5.3

9.8

7

140.9

2.1

0.1

97

66.2

49.6

5.2

4

207.4

6.1

3.2

4

140.9

2.6

2.1

08

66.2

49.7

7.2

9

140.9

2.6

2.1

73

140.9

2.5

3.4

8

0.00E+00

1.00E-01

2.00E-01

3.00E-01

4.00E-01

5.00E-01

6.00E-01

140.1

12.2

4.1

46

114.4

4.2

16.1

53

192.1

68.7

0.1

22

140.9

2.1

41.1

50

140.9

2.5

5.2

13

140.9

2.6

2.2

21

110.2

6.4

.111

140.9

2.7

1.7

4140.1

12.2

4.1

49

140.9

2.4

6.1

75

58.2

41.2

.154

192.1

68.6

0.2

28

195.1

54.1

76.1

5140.9

2.6

0.2

53

140.9

2.7

1.2

8140.9

2.3

8.8

6140.9

2.3

0.1

97

140.9

2.3

0.8

2223.1

40.1

92.1

20

140.9

2.1

42.1

18

140.9

2.7

7.6

5140.9

2.1

6.8

059.1

15.6

7.1

93

140.9

2.2

2.1

12

140.9

2.1

46.2

40

140.9

2.1

41.1

21

111.8

0.1

67.2

12

140.9

2.1

8.2

5192.1

68.6

0.2

39

140.9

2.1

46.6

5140.9

2.1

4.4

7140.9

2.1

8.1

81

192.1

68.3

0.4

8175.9

6.2

35.8

5192.1

68.1

.61

140.9

2.1

8.4

2140.9

2.3

.20

140.9

2.5

3.1

17

140.9

2.3

9.3

5140.9

2.1

46.4

8140.9

2.2

3.1

70

140.9

2.2

3.1

27

192.1

68.4

2.2

08

140.9

2.7

1.2

6140.9

2.2

9.5

2140.9

2.1

42.5

3140.9

2.3

9.4

440.7

7.1

67.9

2140.9

2.2

3.9

1140.9

2.2

9.9

7140.9

2.3

8.9

2140.9

2.6

2.1

83

140.9

2.7

7.1

82

182.9

3.5

5.4

3106.1

.51.1

1140.9

2.5

.102

140.9

2.7

1.6

114.2

00.1

22.2

07

140.9

2.3

0.1

8176.1

16.2

18.1

53

140.9

2.2

2.2

23

140.9

2.2

3.5

9140.9

2.1

46.2

29

140.9

2.1

4.4

8140.9

2.6

0.4

1140.9

2.6

3.1

73

140.9

2.1

46.1

49

140.9

2.6

2.1

79

140.9

2.3

.34

140.9

2.7

1.2

27

140.9

2.1

43.7

861.6

5.1

2.5

0114.1

36.3

1.1

94

140.9

2.1

42.6

0140.9

2.5

3.4

8

Pagerank

Belief Propagation

Ni : The BP Score of ith IPNmax : The Maximum BP Score

Ni : The PR Score of ith IPNmax : The Maximum PR Score

Evidence IPs

Never Miss

118.140.69.162

140.92.10.60

140.92.53.48

168.144.85.166

59.124.20.38

Blue: True PositiveGreen: Suspicious

118.140.69.162

140.92.10.60

59.124.20.38

Evidence IPs

Miss 168.144.85.166

Initial Threat Seed Finding:

with priori knowledge about the threat pattern

Initial Seeds Finding without Known Pattern

50

• How about the case that you don’t have any priori knowledge

•Effect of SVD

SVD

Is IP which hacker used different from others in Suspicious Activity Graph ?! (Cont’d)

51

c0

c1

c0

c2

c1

c2

168.144.85.166

168.144.85.166

168.144.85.16

118.140.69.162

118.140.69.162118.140.69.162

59.124.20.38

59.124.20.38

59.124.20.38

SVD

Benign

Malicious

• Hackers usually use different IPs to achieve their goal– Test the Trojan (China Chopper)

• Weak Link IP– Scan the architecture of website

• Outlier (Decomposition) & Pattern (Pagerank)

52

IP1

China Chopp

er

IP2File Name

IP Address

Hacker’s Action (IP Address)

Weak LinkThis kind of IP is not an outlier and easily ignored by SVD

Limitation - SVD

• How to resolve Weak Link problem?!– Use Graph Mining Algo. to propagate the threat score from IP1 (Seed) to IP2

– Random Walk with Restart (RWR)• Threat Seed as Start Point

53

IP1

China Chopp

er

IP2File Name

IP Address

Hacker’s Action (IP Address)

Weak Link

Limitation - SVD

54

1

4

3

2

56

7

910

8

11

12

Random walk with restart

How to propagate the confidence once you have suspect candidate

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

RWR

Ni : The RWR Score of ith IPNmax : The Maximum PR Score

Evidence IPs

Never MissBlue: True PositiveGreen: Suspicious

59.124.20.38 106.185.46.126 140.92.77.78 125.227.251.25

140.92.10.60 66.249.77.107 140.92.18.143

118.140.69.162 140.92.16.59 1.162.96.56

89.145.95.43 140.92.253.6 168.144.85.166

89.145.95.41 140.92.142.60 61.65.12.50

Threat Seed Propagation – Given Data- or Knowledge-Driven Threat Seeds

RWR

Ni : The RWR Score of ith IPNmax : The Maximum PR Score

Evidence IPs

0

0.05

0.1

0.15

0.2

0.25

0.3Never Miss

59.124.20.38 1.162.96.56

118.140.69.162 168.144.85.166

66.249.77.107

106.185.46.126

Blue: True PositiveGreen: Suspicious

Threat Seed Propagation – Given Data- or Knowledge-Driven Threat Seeds

• Case 1 : IIS Logs – System Compromise• Case 2 : Apache Logs – SQL Injection for

Data Leakage• Case 3 : Apache Logs – System

Compromise57

Suspicious Activity Graph Construction - Apache

• Case 2: Frequently SQL Injection1. Previous Pattern can’t be used because of ./ is the index page (e.g.,

index.php, index.html and so on)

2. Need to consider the attribute: Illegal Characteristic

58

File Name

IP Address

Illegal Character

Hacker’s Action (File)

Hacker’s Action (IP Address)

Hacker’sIllegal Character

59

• Problem:

– Investigate the role of IP Address in Suspicious Activity Graph

– Outliers may be the candidates of initial threat seeds

• Input:

– |IPs| x |(Filem, IllegalCharn)| Matrix

• nij: num of queried same (Filem, IllegalCharn)

by IPi and IPj

• 0: Never Queried by

(Filem, IllegalCharn)

• Output:

– Clustering Result

IP X (Filem, IllegalCharn)

IP1

IP2

.

.

.

IPN

(Filem, IllegalCharn)1… (Filem, IllegalCharn)mXn

n11 0 … nmXnX1

n21 0 … 0

0 0 … nmXnXN

Initial Threat Seed Finding:

without priori knowledge

• Our forensics team gives us priori knowledge to form the

following patterns:

Initial Threat Seed Finding:

with priori knowledge about the threat pattern

61

Ni : The PR Score of ith IPNmax : The Maximum PR Score

PAGERANK

0

0.01

0.02

0.03

0.04

0.05

0.06

62.2

10.1

43.2

45

69.5

8.1

78.5

6

210.6

.198.5

140.1

12.1

25.6

6

1.1

61.1

64.1

4

114.4

6.5

7.3

9

36.2

32.1

16.2

00

120.1

10.8

9.3

3

66.2

49.7

1.2

52

202.4

6.5

6.1

36

210.2

42.2

28.3

6

203.1

98.2

27.7

3

223.1

6.1

1.1

8

123.1

25.7

1.1

5

199.3

0.2

5.1

65

192.2

43.5

5.1

34

61.2

30.2

48.1

66

101.1

39.1

06.2

30

42.6

7.1

00.1

44

36.2

38.9

5.1

58

114.2

4.7

3.1

53

66.2

20.1

46.2

6

211.2

3.2

51.1

57

118.1

70.2

05.1

70

220.1

81.1

08.1

78

65.5

5.2

10.9

0

211.1

04.1

60.1

64

157.5

5.3

9.2

5

208.4

3.2

25.8

4

151.8

0.3

1.1

64

40.7

7.1

67.8

9

150.7

0.1

73.4

1

69.3

0.1

98.2

42

192.2

43.5

5.1

33

207.4

6.1

3.7

0

66.1

02.6

.250

157.5

5.3

9.8

65.5

5.2

10.2

48

59.1

15.1

66.7

3

61.6

5.1

4.4

9

36.2

32.2

18.3

0

40.7

7.1

67.1

04

36.2

27.2

34.9

4

111.2

54.1

66.1

28

140.1

12.7

7.6

118.1

60.2

38.2

9

140.1

19.5

3.2

0

83.168.250.50

39.12.198.8

110.30.221.76

91.194.84.115

Evidence IPsBlue: True PositiveGreen: Suspicious

Miss 120.236.174.19

Initial Threat Seed Finding:

with priori knowledge about the threat pattern

62

Ni : The PR Score of ith IPNmax : The Maximum PR Score

PAGERANK

Evidence IPs

Blue: True PositiveGreen: Suspicious

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

120.236.174.190

91.194.84.115

83.168.250.50 Never Miss

Initial Threat Seed Finding:

with priori knowledge about the threat pattern

63

TENSORc0

c1

83.168.250.50

120.236.174.190

91.194.84.115

c2

83.168.250.50

120.236.174.190

91.194.84.115

c0

c1

c2

83.168.250.50

120.236.174.190

91.194.84.115

• 83.168.250.50 Only Query ./• # Query is similar to normal

120.236.174.190

83.168.250.50

91.194.84.115

We change same # Query of

83.168.250.50

with others

Benign

Malicious

Initial Threat Seed Finding: without priori knowledge

• Use 120.236.174.190 and 91.194.84.115 (Tensor Results Outlier ) as

Start Points for Random Walk with Restart (RWR)

• We expect that 83.168.250.50 (in Unknown Data) can be detected by

RWR

• Input:

– |IPs| x |IPs| | Matrix

• nij: num of queried same (Filem, IllegalCharn) by IPi and IPj

• 0: Never Queried by (Filem, IllegalCharn)

IP X IP

IP1

IP2

.

.

.

IPN

IP1 IP2… IPN

n11 0 … n1N

n21 0 … 0

0 0 … nNN

c1

c2

83.168.250.50

120.236.174.190

91.194.84.115

Unknown Data

Threat Seed Propagation – Given Data- or Knowledge-Driven Threat Seeds

Ni : The RWR Score of ith IPNmax : The Maximum PR Score

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

Threat Seed Propagation – Given Data- or Knowledge-Driven Threat Seeds

RWR

Ni : The RWR Score of ith IPNmax : The Maximum PR Score

0

0.02

0.04

0.06

0.08

0.1

0.12

207.4

6.1

3.7

366.2

49.7

5.2

48

39.1

2.1

98.8

110.3

0.2

21.7

666.2

49.7

1.1

28

66.2

49.7

1.2

48

40.7

7.1

67.1

04

208.4

3.2

25.8

5101.2

26.1

68.2

36

69.3

0.2

11.2

51.2

54.1

31.2

45

61.1

35.1

89.1

24

69.3

0.1

98.2

42

192.2

43.5

5.1

36

63.1

41.2

26.1

78

192.2

43.5

5.1

32

52.1

92.9

.202

52.1

96.1

4.2

48

211.1

04.1

60.1

64

173.2

08.1

69.4

2207.4

6.1

3.3

195.1

54.1

99.2

35

62.2

10.1

43.2

45

69.3

0.1

98.1

86

203.1

04.1

45.5

9192.2

43.5

5.1

29

157.5

5.3

9.1

92

178.6

3.1

3.1

566.2

49.7

1.2

54

192.2

43.5

5.1

31

138.2

01.3

0.6

6144.7

6.2

9.6

669.3

0.2

13.8

261.2

31.1

93.1

39

66.2

20.1

45.2

45

122.1

16.2

20.5

266.2

20.1

56.1

00

101.2

26.1

68.2

54

69.5

8.1

78.5

669.8

4.2

07.2

46

89.1

63.1

48.5

8142.4

.216.1

72

208.1

15.1

11.7

585.9

3.8

9.7

4208.1

15.1

13.9

168.1

80.2

30.4

791.1

94.8

4.1

15

83.1

68.2

50.5

0120.2

36.1

74.1

90

Threat Seed Propagation – Given Data- or Knowledge-Driven Threat Seeds

• Case 1 : IIS Logs – System Compromise• Case 2 : Apache Logs – SQL Injection for

Data Leakage• Case 3 : Apache Logs – System

Compromise67

Case 3: Suspicious Activity Graph Construction - Apache

File Name IP Address Status Code Illegal Character Hacker’s Action (File)

Hacker’s Action (IP Address

• Scope of Attacking

– Each IP correlates multiple files with illegal character.

68

Scope of Attacking

Case 3: Is role of IP different from others file hacker used in Suspicious Activity Graph ?1 -

Apache• Problem:– Investigate the role of IP Address in Suspicious Activity Graph

• Input:– |IPs| x |IPs| Matrix• nij: num of Queried same File

– IPi to IPj• 0: Queried by different File

• Output:– Clustering Result IP X IP

IP1

IP2

.

.

.

IPN

IP1 IP2… IPN

n11 0 … n1N

n21 0 … 0

0 0 … nNN

69

Case 3: Is role of IP different from others file hacker used in Suspicious Activity Graph ?1 -

Apache (Cont’d)

70

SVD

Benign

Malicious

c1

c0

61.209.216.6

223.75.11.56

220.133.111.221

Benign192.225.226.197192.225.226.196

c0

c2

c2

c1

61.209.216.6

223.75.11.56

220.133.111.221

Benign192.225.226.197192.225.226.196

61.209.216.

Benign192.225.226.197192.225.226.196

223.75.11.56

220.133.111.221

Case 3: Why using Propagation to evaluate Suspicious IPs?! - Apache

• The correlation between Files and IPs can’t be evaluate in decomposition SVD

• We mention that evidence have the following patterns:

71

Patterns

IP

File

File

File

File

File

File

File

Case 3: Why using Propagation to evaluate Suspicious IPs?! - Apache (Cont’d)

72

• Design a directed homogeneous graph to illustrate above patterns we found

• In this pattern, we expect to obtain high score IPs from graph when using propagation algorithm– Pagerank

IP

File

File

File

File

File

File

File

73

Case 3: Why using Propagation to evaluate Suspicious IPs?! (Cont’d) – High Score IPs

Ni : The PR Score of ith IPNmax : The Maximum PR Score

PAGERANK

Evidence IPsBlue: True PositiveGreen: Suspicious

Miss192.225.226.197192.225.226.196

0.00E+00

5.00E-02

1.00E-01

1.50E-01

2.00E-01

2.50E-01

3.00E-01

101.1

2.1

03.4

5219.8

5.1

05.6

859.1

15.1

78.2

51

223.1

41.6

6.2

00

220.1

36.2

22.1

23

119.2

46.2

1.3

6220.1

30.1

87.3

149.2

14.1

77.1

17

114.4

1.1

82.1

110.2

7.3

.241

60.2

46.1

31.1

37

140.1

27.2

33.1

839.1

5.1

28.3

5118.1

66.1

80.2

40

123.1

25.7

1.6

01.1

75.6

2.1

26

123.1

25.7

1.2

8125.2

27.1

16.2

48.3

7.2

35.6

136.2

31.1

08.5

1219.8

4.2

36.1

22

140.1

12.2

4.1

05

140.1

27.2

52.5

2114.3

8.8

2.5

239.1

2.2

08.9

042.7

2.7

4.2

5113.2

55.1

09.9

9220.1

34.2

04.1

39

1.1

63.7

8.2

45

70.1

18.2

06.1

48

123.1

94.2

9.1

06

101.2

26.1

66.2

51

223.1

39.1

18.9

6220.1

33.1

16.8

8140.1

12.1

39.4

866.2

49.7

7.1

11

61.2

28.1

88.2

30

49.1

59.2

7.2

11

220.1

81.1

08.1

76

123.1

94.2

08.1

11

218.1

66.1

33.1

94

220.1

81.1

08.1

87

180.2

06.1

3.6

7112.1

05.5

2.2

28

151.8

0.4

1.1

69

65.2

08.1

51.1

17

91.1

94.8

4.1

06

210.6

8.5

3.1

12

61.2

19.7

0.1

66

223.75.11.56

220.133.111.221

61.209.216.6

74

CloudOrion Cloud Threats Analytics

Men in The Cloud Detection

75

File of Team A File of Team B

Cloud Drive

Team A Team B

CloudOrion- Google Drives Logs• Analyzing cloud user behavior

76

Home Mobile Office

Access Log

Scenario PolicyAdministrator

Visualization Threats Explorer

• Intuition to aware the potential threats in cloud

• Drill-down and quick response visualization analytics

– Full and deep views between cloud resources and users

behavior

User and Entity

Behavior Analysis

User access log

Danger Zone

77

CTA UEBA

Collaborative Filtering

Temporal Analysis

On Duty

Off Duty

MaliciousBehavio

r

Spatial Analysis

78

Deep LearningTime-Series Analysis

Using machine learning and artificial

intelligence that analyze cloud logs via

API sets CTA in cloud apart from state-of-

the-art cloud security solution

Scenario Policy Enforcement

• Manage Google Drive and OAuth Tokens• Decreasing risks of data leakage via Scenario Policy

deployment

2016-07-01 18:10:23GMTModify a file in Drive

from Taiwan IP

2016-07-01 18:20:24GMTLogin from US IP

Super Travel Policy

Detecting account compromising by 10+

scenario policies

79

Abnormal Behavior Detection

• Construct user behavior model over files in Drive

– User and Entity Behavior (UEBA)

• Quick identify potential threats

User access log

Drive file proximity score measuring

Structure-oriented risk propagation

Reversed collaborative filtering from file access

behavior

Users past access behavior matrix

User recent access behavior vector

Edit, view and other behavior

Detecting insider

suspect with high-

risk behavior

80

Case Study

Inspected more than 1200 user accounts with

more than 400,000 event logs every day

About 100,000 distinct user-file access

records every day

Mantain 1200 increamental

individual behavior models

Find out anomaly access

record from humongous

of logs

Automatically

81

Conclusions• Easy start 1-2-3– Anaconda Python (https://www.continuum.io/downloads)• iPython notebook!!

– Scikit-learn (http://scikit-learn.org/stable/)– Tensor Flow (https://www.tensorflow.org/)– Bokeh (http://bokeh.pydata.org/en/latest/)– NetworkX (https://networkx.github.io/)– ElasticSearch (https://www.elastic.co/)

82

Conclusions• Good Tutorial– Document Clustering with Python

– http://brandonrose.org/clustering• Using Machine Learning to Track IoT

Vulnerability and Threats– Unstructured data– Structure data– Together them

83

Thanks & Welcome Collaboration• [email protected]• Ching-Hao, Eric, Mao Ph. D.• CTTI,III, Taiwan

84

85