Upload
codemotion
View
98
Download
0
Embed Size (px)
Citation preview
SITUATIONAL AWARENESS, BOT-NET AND MALWARE DETECTIONIN THE MODERN ERAMachine Learning Enabled Advanced Security
CodeMotion Milan 2016
Davide Papini
Doc. Nr XXX— Rev. XXX
Introduction ML for Cyber Security Final Remarks
ABOUT ME
x Research & Innovation @Ele ronica S.p.a.
x Postdoc @ISG Royal Holloway, UK on MLapplied to cyber situational awareness.
x M.Sc. Telecommunication Engineering@Politecnico di Milano:→ Erasmus @Danmarks Tekniske Universitet→ Master Thesis on ``Anomaly Based
Wireless Intrusion Detection Systems''
x Ph.D. @Danmarks Tekniske Universitet:→ ``Attacker Modeling in Ubiquitous
Computing Systems''→ External stay at COSIC, KU Leuven
2
Introduction ML for Cyber Security Final Remarks
WHAT THIS TALK IS ABOUT
Topics:
x Applications of ML in Cybersecurity research.x Successful research: botnets, DGAs, early malwaredetection.x ML traps.x Evaluation metrics.
NOT about:
x New ML algorithms.x Showing one specific Security-ML based application.x Wear you out with math.
3
Introduction ML for Cyber Security Final Remarks
MOTIVATIONAL SLIDE
4
Introduction ML for Cyber Security Final Remarks
MOTIVATIONAL SLIDE
4
Introduction ML for Cyber Security Final Remarks
MOTIVATIONAL SLIDE
5
Introduction ML for Cyber Security Final Remarks
MOTIVATIONAL SLIDE
x Control of the botnet for 10 days: 180,000 infections,recording of over 70GB of data.x Torpig intercepts and records keystroke information at alow level, targeting a wide variety of applications andwebsites.x Stealing financial and personal informations, logincredentials for social networking etc.x Torpig periodically uploads any new data that it hascaptured to a central server.x The researchers were able to infiltrate the botnet byregistering one of the domains from a list of potential onesinfected machines use.
5
Introduction ML for Cyber Security Final Remarks
SOME STATISTICS
h ps://www.mcafee.com/us/resources/reports/rp-quarterly-threats-sep-2016.pdf
6
Introduction ML for Cyber Security Final Remarks
SOME STATISTICS
h ps://www.mcafee.com/us/resources/reports/rp-quarterly-threats-sep-2016.pdf
x 450,000 new malware per day.x 20,000 is mobile malware.x Includes: ransomware, botnets, rootkits, trojians …
6
Introduction ML for Cyber Security Final Remarks
NEED A GAME CHANGER
Modern malware/intrusions are difficult to detect/block:
x Code obfuscation, polimorfism and packing.x Malware written ad-hoc for specific targets.x AVs are mainly signature-based.x URL Blacklists cannot be updated fast enough.x Local changes are often too small/subtle to be detected.x Logs contains lot of noise (≃ 90%)
Need for intelligent approaches:
x Adapt to unforseen "events"x Learn from data i.e. extract behaviours NOT signaturesx Leverage global knowledgex Can be quasi-real-time.
7
Introduction ML for Cyber Security Final Remarks
NEED A GAME CHANGER
Modern malware/intrusions are difficult to detect/block:
x Code obfuscation, polimorfism and packing.x Malware written ad-hoc for specific targets.x AVs are mainly signature-based.x URL Blacklists cannot be updated fast enough.x Local changes are often too small/subtle to be detected.x Logs contains lot of noise (≃ 90%)
Need for intelligent approaches:
x Adapt to unforseen "events"x Learn from data i.e. extract behaviours NOT signaturesx Leverage global knowledgex Can be quasi-real-time.7
ML FOR CYBER SECURITY
Introduction ML for Cyber Security Final Remarks
Machine learning has been applied to many fields in security:
x Botnet detection and classificationx Mobile application analysisx Spam detection and campaigns analysisx Situational awareness through network traffic analysisx Download malware detectionx and many more...
Also in many flavours:
x Supervisedx Unsupervisedx combinations of those
9
Introduction ML for Cyber Security Final Remarks
BOTNETS
x Situational awareness: knowledge of the health status of anetwork (e.g. malware infections, intrusions and dataexfiltration).x Botnet: a network of bots (drones), i.e. programs installedon the machines of unwitting Internet users and receivingcommands from a bot controller.
10
Introduction ML for Cyber Security Final Remarks
BOTNETS
x Situational awareness: knowledge of the health status of anetwork (e.g. malware infections, intrusions and dataexfiltration).x Botnet: a network of bots (drones), i.e. programs installedon the machines of unwitting Internet users and receivingcommands from a bot controller.
10
Introduction ML for Cyber Security Final Remarks
BOTNETS C&C CHANNEL
Bots connect to C&C Server in three ways:
x Hard coded IP:Bot → 1.2.3.4x Hard coded domain:Bot → badguy.ru → 1.2.3.4x Automatically Generated Domains:
→ Bot cycles through time-dependent domains.→ Domain names are generated using a Domain Generation
Algorithm.→ The botmaster needs to register only one of those domains.
jhhfghf7.tk faukiijjj25.tk pvgvy.tkcvq.com epu.org bwn.org
11
Introduction ML for Cyber Security Final Remarks
BOTNETS C&C CHANNEL
Bots connect to C&C Server in three ways:
x Hard coded IP:Bot → 1.2.3.4x Hard coded domain:Bot → badguy.ru → 1.2.3.4x Automatically Generated Domains:
→ Bot cycles through time-dependent domains.→ Domain names are generated using a Domain Generation
Algorithm.→ The botmaster needs to register only one of those domains.
jhhfghf7.tk faukiijjj25.tk pvgvy.tkcvq.com epu.org bwn.org
1;'20$,1
DKM�LQIR
1;'20$,1
VMT�LQIR
�����������
7RUSLJ
KWWS���NUHEVRQVHFXULW\�FRP�ZS�FRQWHQW�XSORDGV���������URJXHBUHJLVWUDUVB����B'5$)7�SGI
courtesy of E.Colombo - Cerberus
11
Introduction ML for Cyber Security Final Remarks
BOTNETS C&C CHANNEL
Bots connect to C&C Server in three ways:x Hard coded IP:Bot → 1.2.3.4x Hard coded domain:Bot → badguy.ru → 1.2.3.4x Automatically Generated Domains:
→ Bot cycles through time-dependent domains.→ Domain names are generated using a Domain Generation
Algorithm.→ The botmaster needs to register only one of those domains.
jhhfghf7.tk faukiijjj25.tk pvgvy.tkcvq.com epu.org bwn.org
1;'20$,1
DKM�LQIR
1;'20$,1
VMT�LQIR
�����������
7RUSLJ
KWWS���NUHEVRQVHFXULW\�FRP�ZS�FRQWHQW�XSORDGV���������URJXHBUHJLVWUDUVB����B'5$)7�SGI
courtesy of E.Colombo - Cerberus
Sinkholing: If domain is alreadyregistered
botmaster looses control of botnets!
11
Introduction ML for Cyber Security Final Remarks
PHOENIX AND CERBERUS
Developed at Polimi and ISG@RHUL
System that relies on Machine Learning to identify DGA:x Leverage known malicious and benign domain names tobuild a classifier:→ Distinguish Human Generated Domains from AGD.→ Identifies the DGA used: botnets might share the same
DGA.x Use unsupervised learning to identify new DGAs.x Traffic comes from a na onal authoritative DNS server.
S. Schiavoni et al., Phoenix: DGA-Based Botnet Tracking and Intelligence. In Detection ofIntrusions and Malware, and Vulnerability Assessment (DIMVA) 2014.E. Colombo, Cerberus: Detec on and Characteriza on of Automa cally-GeneratedMalicious Domains. Master Thesis, Politecnico di Milano 2014.
12
Introduction ML for Cyber Security Final Remarks
PHOENIX AND CERBERUS
Developed at Polimi and ISG@RHUL
System that relies on Machine Learning to identify DGA:x Leverage known malicious and benign domain names tobuild a classifier:→ Distinguish Human Generated Domains from AGD.→ Identifies the DGA used: botnets might share the same
DGA.x Use unsupervised learning to identify new DGAs.x Traffic comes from a na onal authoritative DNS server.
S. Schiavoni et al., Phoenix: DGA-Based Botnet Tracking and Intelligence. In Detection ofIntrusions and Malware, and Vulnerability Assessment (DIMVA) 2014.E. Colombo, Cerberus: Detec on and Characteriza on of Automa cally-GeneratedMalicious Domains. Master Thesis, Politecnico di Milano 2014.
Malicious Domains Phoenix Clusters
Time DetectiveSuspicious Domains
Filtering
DNS Stream
Classifier
Bootstrap
Filtering
Detection
courtesy of E.Colombo - Cerberus
12
Introduction ML for Cyber Security Final Remarks
CERBERUS FINDINGS
x 187 malicious domains detected and labeledx 3,576 suspicious domains collectedx 47 clusters of DGA-generated domains discoveredx 319 new domains detected in the next 24 hours
13
Introduction ML for Cyber Security Final Remarks
MASTINO: REALTIME MALWARE DETECTION
Developed at TrendMicro and presented Defcon London 2016
System for advanced realtime malware detection:
x Leverages global knowledge on download eventsx Classifies malware from goodwarex Based on statistical evidence and graph analysis:x Tripartite graph: URLs, Files, Machinesx Intrinsic features e.g.→ file: size, obfuscated, signed;→ url: FQD, e2LD, query path→ machine: malware download history, processesx Behaviour-based features:→ Consider reputation of neighboring nodes→ Help to classify unknown nodes
14
Introduction ML for Cyber Security Final Remarks
MASTINO: REALTIME MALWARE DETECTION
Developed at TrendMicro and presented Defcon London 2016
System for advanced realtime malware detection:
x Leverages global knowledge on download eventsx Classifies malware from goodwarex Based on statistical evidence and graph analysis:x Tripartite graph: URLs, Files, Machinesx Intrinsic features e.g.→ file: size, obfuscated, signed;→ url: FQD, e2LD, query path→ machine: malware download history, processesx Behaviour-based features:→ Consider reputation of neighboring nodes→ Help to classify unknown nodes
Huge work on feature enginering!
14
Introduction ML for Cyber Security Final Remarks
MASTINO SYSTEM OVERVIEW
Copyright 2016 Trend Micro Inc.7
System Overview
courtesy of M.Balduzzi - TrendMicro
15
Introduction ML for Cyber Security Final Remarks
MASTINO TRAINING AND DETECTION
courtesy of M.Balduzzi - TrendMicro
16
Introduction ML for Cyber Security Final Remarks
MASTINO RESULTS
Mastino evaluation:
x On testing dataset: 95.8% TP, 0.5% FPx Early detection experiment, deployed in the wild for 6months:→ Detected 84% of future malware→ Verified later through VirusTotal
Detec on me≃ 0.16s!
17
Introduction ML for Cyber Security Final Remarks
MASTINO RESULTS
Mastino evaluation:
x On testing dataset: 95.8% TP, 0.5% FPx Early detection experiment, deployed in the wild for 6months:→ Detected 84% of future malware→ Verified later through VirusTotal
Detec on me≃ 0.16s!
17
Introduction ML for Cyber Security Final Remarks
ISSUES
Traditional ML developed for ``natural'' objects:
x Natural Language Processing.x Image analysis e.g. picture text search.x Classification of plants animals.x Economics laws.
Metrics like ROC, FP, FN, work very well in these cases,however cyberworld is not natural:
x Things change abruptly e.g. updates, new malware, newtechnologies.x There is no clear evolutionary law.x Change is deterministic and unpredictable.x Behaviours change/slide over time.
18
Introduction ML for Cyber Security Final Remarks
ML TRAPS
Machine learning often seen as a black-box panacea:
x Little is understood.x Results with hi accuracy taken without questioning quality.
However:
x Overfitting: if training and testing is not done carefully.x Validity of results: a system that works on paper may notwork in the field.x Datasets: Variety vs Chronology
Need for novel metrics!
19
Introduction ML for Cyber Security Final Remarks
ML TRAPS
Machine learning often seen as a black-box panacea:
x Little is understood.x Results with hi accuracy taken without questioning quality.
However:
x Overfitting: if training and testing is not done carefully.x Validity of results: a system that works on paper may notwork in the field.x Datasets: Variety vs Chronology
Need for novel metrics!
19
Introduction ML for Cyber Security Final Remarks
CONFORMAL EVALUATOR
Library developed at Informa on Security Group at RoyalHolloway:x Evaluates algorithms in terms of confidence and credibility.x Core is Non-Conformity measure, elicited directly from the
algorithm, which in essence tells the difference between asample and a set of samples.x Builds decision and alpha assessments to evaluate thealgorithm.
R. Jordaney, Z. Wang, D. Papini, I. Nouretdinov and L. Cavallaro, Misleading Metrics:On Evalua ng Machine Learning for Malware with Confidence, Technical Report 2016-1Royal Holloway University of London.
20
Introduction ML for Cyber Security Final Remarks
CONFORMAL EVALUATOR
Library developed at Informa on Security Group at RoyalHolloway:x Evaluates algorithms in terms of confidence and credibility.x Core is Non-Conformity measure, elicited directly from the
algorithm, which in essence tells the difference between asample and a set of samples.x Builds decision and alpha assessments to evaluate thealgorithm.
R. Jordaney, Z. Wang, D. Papini, I. Nouretdinov and L. Cavallaro, Misleading Metrics:On Evalua ng Machine Learning for Malware with Confidence, Technical Report 2016-1Royal Holloway University of London.
Training andTestingDataset
Similarity BasedClassification/Clustering
Algorithm
ConformalEvaluator
AlphaAssessment
DecisionAssessment
Non-ConformityMeasure
Conformal Evaluator Overview
20
Introduction ML for Cyber Security Final Remarks
CE: EXAMPLE 1
System for Botnet detection and classification
bifrose sasfis blackenergy banbra pushdo0.0
0.2
0.4
0.6
0.8
1.0 0.86 0.27 0.29 0.31 0.9 0.2 0.84 0.18 0.95 0.15
Average algorithm correct choiceAverage algorithm credibility Average algorithm confidence
bifrose sasfis blackenergy banbra pushdo0.0
0.2
0.4
0.6
0.8
1.0 0.42 0.53 0.58 0.17 0.68 0.29 0.62 0.29 0.73 0.12
Average algorithm incorrect choiceAverage algorithm credibility Average algorithm confidence
Decision Assessment
21
Introduction ML for Cyber Security Final Remarks
CE: EXAMPLE 1
System for Botnet detection and classification
bifrose sasfis blackenergy banbra pushdo0.0
0.2
0.4
0.6
0.8
1.0 0.86 0.27 0.29 0.31 0.9 0.2 0.84 0.18 0.95 0.15
Average algorithm correct choiceAverage algorithm credibility Average algorithm confidence
bifrose sasfis blackenergy banbra pushdo0.0
0.2
0.4
0.6
0.8
1.0 0.42 0.53 0.58 0.17 0.68 0.29 0.62 0.29 0.73 0.12
Average algorithm incorrect choiceAverage algorithm credibility Average algorithm confidence
Decision Assessment
bifrose'ssamples
sasfis'ssamples
blackenergy'ssamples
banbra'ssamples
pushdo'ssamples
0.0
0.2
0.4
0.6
0.8
1.0
P-v
alu
es
P-values: bifrose P-values: sasfis P-values: blackenergy P-values: banbra P-values: pushdo
Alpha Assessment
21
Introduction ML for Cyber Security Final Remarks
CE: EXAMPLE 1
System for Botnet detection and classification
bifrose sasfis blackenergy banbra pushdo0.0
0.2
0.4
0.6
0.8
1.0 0.86 0.27 0.29 0.31 0.9 0.2 0.84 0.18 0.95 0.15
Average algorithm correct choiceAverage algorithm credibility Average algorithm confidence
bifrose sasfis blackenergy banbra pushdo0.0
0.2
0.4
0.6
0.8
1.0 0.42 0.53 0.58 0.17 0.68 0.29 0.62 0.29 0.73 0.12
Average algorithm incorrect choiceAverage algorithm credibility Average algorithm confidence
Decision Assessment
bifrose'ssamples
sasfis'ssamples
blackenergy'ssamples
banbra'ssamples
pushdo'ssamples
0.0
0.2
0.4
0.6
0.8
1.0
P-v
alu
es
P-values: bifrose P-values: sasfis P-values: blackenergy P-values: banbra P-values: pushdo
Alpha Assessment
Although the algorithm has reasonably good re-sults on paper, CE shows the quality of the re-sults is not good!
We run experiments on another dataset toconfirm, and the classifier get worse.
21
Introduction ML for Cyber Security Final Remarks
CE: EXAMPLE 2
Mobile App classification: Malware vs Goodware
Correct choices Incorrect choices0.0
0.2
0.4
0.6
0.8
1.0
Average algorithm credibility for correct choiceAverage algorithm confidence for correct choiceAverage algorithm credibility for incorrect choiceAverage algorithm confidence for incorrect choice
MALICIOUS'ssamples
BENIGN'ssamples
0.0
0.2
0.4
0.6
0.8
1.0
P-va
lues
P-values: MALICIOUS P-values: BENIGN
22
FINAL REMARKS
Introduction ML for Cyber Security Final Remarks
FINAL REMARKS
Getting your hands in the game, what you need:
x You need to study a bit of MLx You need a problemx You need datax You need good metricsx In the wild analysis is a plusx You need tools:→ We did everything in python: Numpy, Scipy→ ML libraries: sk-learn, shogun-toolbox.org
24
Introduction ML for Cyber Security Final Remarks
FINAL REMARKS
Machine Learning is great for Cyber Security!
Thanks for listening:Ques ons?
25
Introduction ML for Cyber Security Final Remarks
FINAL REMARKS
Machine Learning is great for Cyber Security!
Thanks for listening:Ques ons?
25