45
Applying Soft Computing Techniques to Corporate Mobile Security Systems Máster en Ingeniería de Computadores y Redes Paloma de las Cuevas Delgado Dirigida por los Doctores: Antonio Miguel Mora García Juan Julián Merelo Guervós

Applying soft computing techniques to corporate mobile security systems

Embed Size (px)

DESCRIPTION

Corporate workers increasingly use their own devices for work purposes, in a trend that has come to be called the "Bring Your Own Device" (BYOD) philosophy and companies are starting to include it in their policies. For this reason, corporate security systems need to be redefined and adapted, by the corporate Information Technology (IT) department, to these emerging behaviours. This work proposes applying soft-computing techniques, in order to help the Chief Security Officer (CSO) of a company (in charge of the IT department) to improve the security policies. The actions performed be company workers under a BYOD situation will be treated as events: an action or set of actions yielding to a response. Some of those events might cause a non compliance with some corporate policies, and then it would be necessary to define a set of security rules (action, consequence). Furthermore, the processing of the extracted knowledge will allow the rules to be adapted.

Citation preview

Page 1: Applying soft computing techniques to corporate mobile security systems

Applying Soft Computing Techniques to Corporate Mobile

Security SystemsMáster en Ingeniería de Computadores y

RedesPaloma de las Cuevas Delgado

Dirigida por los Doctores:Antonio Miguel Mora GarcíaJuan Julián Merelo Guervós

Page 2: Applying soft computing techniques to corporate mobile security systems

1. Research context.2. Underlying problem and objectives.3. Data description and preprocessing.4. Experimental setup.5. Experiments and results.6. Conclusions and scientific contributions.7. Future Work.

Index

Page 3: Applying soft computing techniques to corporate mobile security systems

1. Research context.2. Underlying problem and objectives.3. Data description and preprocessing.4. Experimental setup.5. Experiments and results.6. Conclusions and scientific contributions.7. Future Work.

Index

Page 4: Applying soft computing techniques to corporate mobile security systems

Research Context

1

Page 5: Applying soft computing techniques to corporate mobile security systems

Research Context● Bring Your Own Device problem

2

Page 6: Applying soft computing techniques to corporate mobile security systems

Research Context

● MUSES SERVER

3

Page 7: Applying soft computing techniques to corporate mobile security systems

1. Research context.2. Underlying problem and objectives.3. Data description and preprocessing.4. Experimental setup.5. Experiments and results.6. Conclusions and scientific contributions.7. Future Work.

Index

Page 8: Applying soft computing techniques to corporate mobile security systems

● Enterprise Security applied to employees’ connections to the Internet (URL requests).

● Security? How?○ Proxy○ Blacklists○ Whitelists○ Firewalls○ Elaboration of Corporate Security Policies

● The aim of this research is going a step beyond.

Underlying Problem

List of URLs which are permitted (white) or not (black)

5

Page 9: Applying soft computing techniques to corporate mobile security systems

● Objective → to obtain a tool for automatically making an allowance or denial decision with respect to URLs that are not included in the black/whitelists.

○ This decision would be based in the one made for similar URL accesses (those with similar features).

○ The tool should consider other parameters of the request in addition to the URL string.

Objectives

6

Page 10: Applying soft computing techniques to corporate mobile security systems

1. Data Mining processa. Parsingb. Preprocessing

2. Labelling process (requests labelled as ALLOW or DENY)3. Machine Learning4. Studying classification accuracies

Followed Schema

7

Page 11: Applying soft computing techniques to corporate mobile security systems

1. Research context.2. Underlying problem and objectives.3. Data description and preprocessing.4. Experimental setup.5. Experiments and results.6. Conclusions and scientific contributions.7. Future Work.

Index

Page 12: Applying soft computing techniques to corporate mobile security systems

● Employees requesting accesses to URLs (records from an actual Spanish company - around 100 employees) during workday.

● Having access to a Log File of 100k entries (patterns) within two hours (8 - 10 am). CSV file format.

● Also, we were provided with a set of rules (specification of the security policies on if-then clauses).

Working Scenario

9

Page 13: Applying soft computing techniques to corporate mobile security systems

● An Entry (unlabelled)

● A Policy and a Rule

“Video streamings cannot be reproduced”

Data description

http_reply_code

http_method

duration_miliseconds

content_type server_or_cache_address

time squid_hierarchy bytes url client_adress

200 GET 1114 application/octet-stream

X.X.X.X 08:30:08 DEFAULT_PARENT 106961 http://www.one.example.com

X.X.X.X

rule "policy-1 MP4"

attributes

when

squid:Squid(dif_MCT=="video",bytes>1000000,

content_type matches "*.application.*,

url matches "*.p2p.* )

then

PolicyDecisionPoint.deny();

end10

Page 14: Applying soft computing techniques to corporate mobile security systems

● An Entry○ Has 7 categorical fields and 3 numerical fields.

● A Rule○ Has a set of conditions, and a decision (ALLOW/DENY).○ Each condition has three parts:

■ Data Type (e.g. bytes)■ Relationship (e.g. < )■ Value (e.g. 1000000)

Data description

11

Page 15: Applying soft computing techniques to corporate mobile security systems

● Drools and Squid syntax for the rules, CSV format for Log data.

● Weka, which has a great and state-of-the-art set of classifiers.

● Two implementations:○ Perl → faster in the parsing process, slower with the labelling

process and the use of weka.○ Java → native implementation with weka, better for automation,

and it will be embedded in an actual Java project (MUSES).

Tools used during this research

12

Page 16: Applying soft computing techniques to corporate mobile security systems

● A hash with the entries○ Keys → Entry fields○ Values → Field values

● A hash with the set of rules○ Keys → Condition fields, and decision

○ Values → Name of the data type, its desired value, relationship between them, and allow, or deny.

After the parsing process%logdata = (

entry =>{

http_reply_code =>xxx

http_method =>xxx

duration_miliseconds =>xxx

content_type =>xxx

server_or_cache_address =>xxx

time =>xxx

squid_hierarchy =>xxx

bytes =>xxx

url =>xxx

client_address =>xxx

},

);

%rules = (

rule =>{

field =>xxx

relation =>xxx

value =>xxx

decision =>[allow, deny]

},

);

13

Page 17: Applying soft computing techniques to corporate mobile security systems

● The two hashes are compared during the labelling process.● Conditions of each rule are checked in each entry.● If an entry meets all conditions, it is labelled with the

corresponding decision of the rule.● A pair key-value is included in the hash of that entry, with

the decision.● Conflict resolution:

Labelling Process

○ Entry meets conditions of a rule that allows making the request.○ Entry meets conditions of a rule that denies making the request.

14

Page 18: Applying soft computing techniques to corporate mobile security systems

1. Research context.2. Underlying problem and objectives.3. Data description and preprocessing.4. Experimental setup.5. Experiments and results.6. Conclusions and scientific contributions.7. Future Work.

Index

Page 19: Applying soft computing techniques to corporate mobile security systems

● The CSV file, now with all the patterns that could be labelled (the others were not covered by the rules), has 57502 entries/patterns:○ 38972 with an ALLOW label.○ 18530 with a DENY label.

● It might be needed to apply data balancing techniques:○ Undersampling: random removal of patterns in majority class.○ Oversampling: duplication of each pattern in minority class.

Data Summary

2:1 ratio

16

Page 20: Applying soft computing techniques to corporate mobile security systems

● The classifiers are tested, firstly, with a 10-fold cross-validation process.○ Top five classifiers in accuracy, are chosen for the following

experiments.○ Also, Naïve Bayes classifier is taking as a reference.

● Secondly, a division process is performed over the initial (labelled) log file, into both training and test files.

● These training and test files are created with different ratios and either taking the entries randomly or sequentially.

Experimental Setup

17

Page 21: Applying soft computing techniques to corporate mobile security systems

1. Research context.2. Underlying problem and objectives.3. Data description and preprocessing.4. Experimental setup.5. Experiments and results.6. Conclusions and scientific contributions.7. Future Work.

Index

Page 22: Applying soft computing techniques to corporate mobile security systems

Flow Diagram

1) Initial labelling process.Experiments with unbalanced, and balanced data. From those, divisions are made:● 80% training 20% testing● 90% training 10% testing

Randomly, and sequentially.

3) Enhancing the creation of training and test files.Experiments with unbalanced data. From those, divisions are made, patterns randomly taken:● 80% training 20% testing● 90% training 10% testing● 60% training 40% testing

2) Removal of duplicated requests.Experiments with unbalanced data. From those, divisions are made:● 80% training 20% testing● 90% training 10% testing● 60% training 40% testing

Randomly, and sequentially.

4) Filtering the features of the URL.Experiments with unbalanced, and balanced data. From those, divisions are made, patterns randomly taken:● 80% training 20% testing● 90% training 10% testing● 60% training 40% testing

19

Page 23: Applying soft computing techniques to corporate mobile security systems

● The classifiers are tested, firstly, with a 10-fold cross-validation process over the balanced data.

First set of experiments1) Initial labelling process.

20

Page 24: Applying soft computing techniques to corporate mobile security systems

● Naïve Bayes and top five classifiers are tested with training and test divisions, in order to avoid testing patterns being used for training and vice versa.

First set of experiments1) Initial labelling process.

21

Page 25: Applying soft computing techniques to corporate mobile security systems

First set of experiments1) Initial labelling process.

Divisions made over unbalanced data

22

Page 26: Applying soft computing techniques to corporate mobile security systems

First set of experiments1) Initial labelling process.

Divisions made over balanced data (undersampling)

23

Page 27: Applying soft computing techniques to corporate mobile security systems

First set of experiments1) Initial labelling process.

Divisions made over balanced data (oversampling)

24

Page 28: Applying soft computing techniques to corporate mobile security systems

● We studied the field squid_hierarchy and saw that had two possible values: DIRECT or DEFAULT_PARENT.

● The connections are made, firstly, to the Squid proxy, and then, if appropriate, the request continues to another server.○ Then, some of the entries were repeated, and results may be affected for

that.

Second set of experiments2) Removal of duplicated requests.

http_reply_code

http_method

duration_miliseconds

content_type server_or_cache_address

time squid_hierarchy bytes url client_adress

200 GET 1114 application/octet-stream

X.X.X.X 08:30:08 DEFAULT_PARENT 106961 http://www.one.example.com

X.X.X.X

25

Page 29: Applying soft computing techniques to corporate mobile security systems

Second set of experiments2) Removal of duplicated requests.

Divisions made over unbalanced data

26

Page 30: Applying soft computing techniques to corporate mobile security systems

● Repeated URL core domains could yield to false results.● During the division process, we ensured that requests with the same URL

core domain went to the same file (either for training or for testing).

Third set of experiments3) Enhancing the creation of training and test files.

27

Page 31: Applying soft computing techniques to corporate mobile security systems

Third set of experiments3) Enhancing the creation of training and test files.

28

Page 32: Applying soft computing techniques to corporate mobile security systems

● In the experiments that included only the URL core domain as a classification feature, rules were too focused on that feature.

Created Rules During Classification

PART decision list

------------------

url = dropbox: deny (2999.0)

url = ubuntu: allow (2165.0)

url = facebook: deny (1808.0)

url = valli: allow (1679.0)

29

Page 33: Applying soft computing techniques to corporate mobile security systems

● Another kind of rules were found, but always dependant on the URL core domain.

Created Rules During Classification

url = grooveshark AND

http_method = POST: allow (733.0)

url = googleapis AND

content_type = text/javascript AND

client_address = 192.168.4.4: allow (155.0/2.0)

url = abc AND

content_type_MCT = image AND

time <= 31532000: allow (256.0)

30

Page 34: Applying soft computing techniques to corporate mobile security systems

● Rules created by the classifiers are too focused on the URL core domain feature.

● We did the experiments again with the original file, but including as a feature only the Top Level Domain of the URL, and not the core domain.

Fourth set of experiments4) Filtering the features of the URL.

31

Page 35: Applying soft computing techniques to corporate mobile security systems

Fourth set of experiments4) Filtering the features of the URL.

Divisions made over unbalanced data

32

Page 36: Applying soft computing techniques to corporate mobile security systems

Fourth set of experiments4) Filtering the features of the URL.

Divisions made over balanced data

33

Page 37: Applying soft computing techniques to corporate mobile security systems

● After including the URL top level domain as a classification feature, instead of URL core domain, rules classify mainly by server address.

Created Rules During Classification

PART decision list

------------------

server_or_cache_address = 173.194.34.248: allow (238.0/1.0)

server_or_cache_address = 8.27.153.126: allow (235.0/2.0)

server_or_cache_address = 91.121.155.13: deny (235.0)

server_or_cache_address = 66.220.152.19: deny (201.0)

34

Page 38: Applying soft computing techniques to corporate mobile security systems

● URL TLD appears, but now the rules are not always dependant on this feature.

Created Rules During Classification

server_or_cache_address = 90.84.53.48 AND

client_address = 10.159.39.199 AND

tld = es AND

time <= 31533000: allow (138.0/1.0)

content_type = application/octet-stream AND

tld = com AND

server_or_cache_address = 192.168.4.4 AND

client_address = 10.159.86.22: allow (210.0)

server_or_cache_address = 90.84.53.19 AND

tld = com: deny (33.0/1.0)

35

Page 39: Applying soft computing techniques to corporate mobile security systems

1. Research context.2. Underlying problem and objectives.3. Data description and preprocessing.4. Experimental setup.5. Experiments and results.6. Conclusions and scientific contributions.7. Future Work.

Index

Page 40: Applying soft computing techniques to corporate mobile security systems

● In most cases, Random Forest classifier is the one that yields better results.

● The loss of information when analysing a Log of URL requests lowers the results. This happens when:○ Oversampling data (because we randomly remove data).

○ Keeping the sequence of the requests of the initial Log file while making the division in training and test files.

Conclusions

37

Page 41: Applying soft computing techniques to corporate mobile security systems

● For future experiments, it should be ensured that same URL lexical features (like the core domain) are not in both training and test files at the same time.○ This wrongs the results.

● As seen in the rules obtained, it is possible to develop a tool that automatically makes an allowance or denial decision with respect to URLs, and that decision would depend on other features of a URL request and not only the URL.

Conclusions

38

Page 42: Applying soft computing techniques to corporate mobile security systems

● MUSES: A corporate user-centric system which applies computational intelligence methods, at ACM SAC conference, Gyeongju, Korea, March 2014.

● Enforcing Corporate Security Policies via Computational Intelligence Techniques, at SecDef Workshop at GECCO, Vancouver, July 2014.

● Going a Step Beyond the Black and White Lists for URL Accesses in the Enterprise by means of Categorical Classifiers, at ECTA, Rome, Italy, October 2014.

Scientific Contributions

39

Page 43: Applying soft computing techniques to corporate mobile security systems

1. Research context.2. Underlying problem and objectives.3. Data description and preprocessing.4. Experimental setup.5. Experiments and results.6. Conclusions and scientific contributions.7. Future Work.

Index

Page 44: Applying soft computing techniques to corporate mobile security systems

● Making experiments with bigger data sets (e.g. a whole workday).

● Include more lexical features of a URL in the experiments (e.g. number of subdomains, number of arguments, or the path).

● Consider sessions when classifying.○ Defining session as the set of requests that are made from a certain

client during a certain time).

● To finally implement a system and to prove them with real data, in real-time.

Future Work

41

Page 45: Applying soft computing techniques to corporate mobile security systems

Thank you for your attentionQuestions?

[email protected] @unintendedbear