57
Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University of Louisiana at Lafayette [email protected] , [email protected] 7/22/2015 (C) 2015 U. Louisiana at Lafayette 1

Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Harnessing Intelligence from Malware

Repositories

Arun Lakhotia and Vivek NotaniSoftware Research Lab

University of Louisiana at [email protected] , [email protected]

7/22/2015 (C) 2015 U. Louisiana at Lafayette 1

Page 2: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Self Introduction

Software Research Lab 10 years research on Malware Graduate course on malware analysis Active interaction with industry Funded by AFOSR, ARO, DARPA, ONR, and

State of Louisiana

Research Focus How does malware evade detection? How to detect stealthy malware? Malware analysis in the large

Results Papers: 50+ peer-reviewed Patents: one granted Degrees: 6 Ph.D., 8 M.Sc. Research Funding: $5MM+

7/22/2015 (C) 2015 U. Louisiana at Lafayette 2

Page 3: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Targeted Attacks

7/22/2015 (C) 2015 U. Louisiana at Lafayette 3

Page 4: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

4

Machine Generation of Malware

Page 5: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

CyberSecurity Paradox

Physical world: FAILED attempts are INVESTIGATED

Cyber World: FAILED attempts are CELEBRATED

Page 6: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Extract Intelligence from

Malware

7/22/2015 (C) 2015 U. Louisiana at Lafayette 6

Connect families

Discover trends Malware evolution

Connect actors

Page 7: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University
Page 8: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Requirement: “Google” for Malware

8

“Google”

Malware Collection

0.90

0.82

0.76

0.30

Nearest Match

Query Malware

Page 9: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Challenges

9

❶ ❸❷ ❹

❶ Remove obfuscation

❷ Map to document

❸ Create index

❹ Search

Page 10: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Key Innovation: VM Introspection

in the Cloud

VM VM

HYPERVISOR

7/22/2015 (C) 2015 U. Louisiana at Lafayette 10

Page 11: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Key Innovation: Semantic

Fingerprints

(380091df) (0091df96) (91df96f6) (df96f633)

eax = def(ebp)ebp = -4+def(esp)esp = -8+def(esp)

memdw(-8+def(esp))= def(ebp)memdw(-4+def(esp))= def(ebp)memdw(4+def(esp)) = def(memdw(def(esp)))

STATE OF PRACTICE VIRUSBATTLE

Bit shreds Semantics

Fingerprint Fingerprint

Susceptible to obfuscation Obfuscation resistent7/22/2015 (C) 2015 U. Louisiana at Lafayette 11

Page 12: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Semantics Enabled: Connecting Malware

through Code

Aldibot

Ponyloader

Smokeloader

Darkcomet

Semantically similar binaries between malware families

7/22/2015 (C) 2015 U. Louisiana at Lafayette 12

Page 13: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Unpacking Malware

(C) 2015 U. Louisiana at Lafayette 137/22/2015

Page 14: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Challenge 1: Packing

7/22/2015 (C) 2015 U. Louisiana at Lafayette 14

Page 15: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Classes of Packers (Protectors)

• Classification parameter• Based on execution behavior

• When and how much of the original code is decrypted

• Traditional Packer• Entire original code is decrypted at one time

• Entire original code is in clear text before it is executed

• Paged Packer• Just-In-Time decryption of a page when it is executed

• Only a ‘page’ of the original code is in clear text at any time

• Virtual Machine Protectors• Decrypt a single instruction at a time

• None of the original code is ever in clear text

7/22/2015 (C) 2015 U. Louisiana at Lafayette 15

Page 16: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Unpacking: State-of-Practice

7/22/2015 (C) 2015 U. Louisiana at Lafayette 16

Malware

Windows Guest OS

UNPACKER CODE

Virtualization Stack

Page 17: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Innovation: Unpacking using VM

Introspection

17(C) 2015 U. Louisiana at Lafayette7/22/2015

Malware

Windows Guest OS

QEMU Hypervisor

Linux Guest OS

Virtualization Stack

VB UNPACKER CODE

Observe malware below ring 0

Page 18: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Unpacking Traditionally Packed

Malware

18(C) 2015 U. Louisiana at Lafayette7/22/2015

Execute Malware

Detect When to

Stop

Extract Revealed

Code

Page 19: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

When to Stop: Hump and Dump

Addresses Ordered by last execution time

• Traditional Packer• Decryption in a loop

• High instruction execution frequency

• Spike in frequency graph

• Hump & Dump Algorithm• Detect spike – hump• Detect end – flat

19(C) 2015 U. Louisiana at Lafayette7/22/2015

PE Compact2.5 (calc.exe), Linear Scale

Page 20: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

When to Stop: TimeOut

• What if Hump is never detected?• TimeOut

• Limits execution time

7/22/2015 (C) 2015 U. Louisiana at Lafayette 20

Page 21: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Constructing PE

• Modify OEP using last PC value

• Fix Section Headers

• Copy Memory Contents to new PE

7/22/2015 (C) 2015 U. Louisiana at Lafayette 21

Page 22: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Extracting Memory Contents:

Challenge

• Extracting memory through hypervisor

• Memory contents may be paged out by GuestOS

• Solution:• Determine memory is paged out

• Analyze execution profile

• Re-run unpacker with new parameters• Catch before memory is paged out

7/22/2015 (C) 2015 U. Louisiana at Lafayette 22

Page 23: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Dataset Description:• File Type: PE-32• Source: FBI• Availability: Upon request• Collection period: 1 year

Bot Family # Executables

Aldibot 19

Armageddon 1

Blackenergy 65

Darkcomet 339

Darkshell 379

Ddoser 5

Illusion 17

Nitol 11

Optima 160

Ponyloader 1,312

Smokeloader 31

Umbraloader 25

Yzf 4

Zeus 41

Total 2,409

Case Study

23(C) 2015 U. Louisiana at Lafayette7/22/2015

Page 24: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Case Study: Results

7/22/2015 (C) 2015 U. Louisiana at Lafayette 24

Unpacked Binary “very similar” to Original => Poor Unpacking

HEURISTIC

Original Unique

Unpacked

Poor

Unpacking

Poor

Unpacking

(%)

Hump and Dump 1,671 1,523 205 12.27

TimeOut 515 500 46 8.93

Self-tuning 168 163 23 13.69

TOTAL 2,354 2,186 274 11.64

• Input : 2,409 • Unpacked : 2,354• Output : 2,185

Page 25: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Unpacker’s Impact: Analysis Cost

Reduction

UnpackedMalware

CNC IP

Malware Binaries # %

Original binaries 2,409

Unpacked successfully 2,354 97.7%

Unique unpack result 2,185 92.8%

Reduction in analyst work 169 7.2%Multiple malware produce the same unpacked result

7/22/2015 (C) 2015 U. Louisiana at Lafayette 25

Page 26: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Matching Code

(C) 2015 U. Louisiana at Lafayette 267/22/2015

Page 27: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Challenge 2: Code Obfuscation

7/22/2015 (C) 2015 U. Louisiana at Lafayette 27

push ecx

mov ecx,ebp

add ecx,33

push esi

mov esi,ecx

sub esi,34

mov [esi-2],eax

pop esi

pop ecx

push ecx

mov ecx, ebp

push eax

mov eax, 33

add ecx, eax

pop eax

push esi

mov esi, ecx

push edx

mov edx, 34

sub esi, edx

pop edx

mov [esi - 2], eax

pop esi

pop ecx

push ecx

mov ecx, [ebp + 10]

mov ecx, ebp

push eax

add eax, 2342

mov eax, 33

add ecx, eax

pop eax

mov eax, esi

push eax

mov esi, ecx

push edx

xor edx, 778f

mov edx, 34

sub esi, edx

pop edx

mov [esi-2], eax

pop esi

pop ecx

push ecx

mov ecx,ebp

add ecx,33

mov [ecx-36],eax

pop ecx

Page 28: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Requirements

• Scale requirement• Search in collection of thousands to millions of malware

• Performance requirement• Provide results in seconds, or less

• Quality requirement• Error rates should be comparable to pairwise matching

7/22/2015 (C) 2015 U. Louisiana at Lafayette 28

Page 29: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Representations for Matching

Binaries

29

Map binary to ‘document’

Word = N-Bytes

(380091df) (0091df96) (91df96f6) (df96f633)

Disassemble

Word = N-mnemonic

(je push)(push mov)(mov pop)(pop xor)

Abstracted Bytecode

(C) 2015 U. Louisiana at Lafayette7/22/2015

Page 30: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

VirusBattle Strategy

30

Map binary to CFG to Document

Binary Disassembly CFG

Word = Block

Abstracted Bytecode

Abstracted Disassembly

Semantics

Juice

(C) 2015 U. Louisiana at Lafayette7/22/2015

Page 31: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Code to Semantics

31

push ebpmov ebp,espsub esp,4mov eax, DWORD ebp+4mov DWORD ebp+8,eaxmov eax, DWORD ebpmov DWORD ebp-4,eax

eax = def(ebp)ebp = -4+def(esp)esp = -8+def(esp)

memdw(-8+def(esp))= def(ebp)memdw(-4+def(esp))= def(ebp)memdw(4+def(esp)) = def(memdw(def(esp)))

Code Semantics• Sequential• Focus on operations

• Parallel• Captures affect

Interpret: seq(Instruction) -> State -> StateState = LValue -> RValue

LValue = Register + Mem

+ def(RValue)+ RValue op Rvalue+ op RValue

Unsimplified

Value in previous state

RValue = Int

7/22/2015 (C) 2015 U. Louisiana at Lafayette

Page 32: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Limitations of (Block) Semantics

• Does not capture:• Register renaming

• Memory address reassignment

• Code motion between blocks

• Evolutionary changes• Hashes good for strict equality

• Solution:• Generalize semantics

• Juice

• Use n-Block semantics

• Use fuzzy hashes

327/22/2015 (C) 2015 U. Louisiana at Lafayette

Page 33: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Semantics to ‘words’

• Challenge:• How to map equal semantics to the same `word’?

• Solution:• Define canonical ordering

• RValue structures are ground

• Use ordering over symbols

• Account for commutativity

• Sum-of-product form

• Simplify

• Word = Hash (md5, SHA1) of linearized semantics

33

RValue = Int+ def(RValue)+ RValue op Rvalue+ op RValue

7/22/2015 (C) 2015 U. Louisiana at Lafayette

Page 34: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Computing Juice

34

Problem: Establish constraints between induced variables?

Solution

1. Track simplification steps

2. Generalize simplification steps

7/22/2015 (C) 2015 U. Louisiana at Lafayette

Page 35: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Semantics and Juice

35

push ebpmov ebp,espsub esp,4mov eax, DWORD ebp+4mov DWORD ebp+8,eaxmov eax, DWORD ebpmov DWORD ebp-4,eax

eax = def(ebp)ebp = -4+def(esp)esp = -8+def(esp)

memdw(-8+def(esp))= def(ebp)memdw(-4+def(esp))= def(ebp)memdw(4+def(esp)) = def(memdw(def(esp)))

Code Semantics

A = def(B), B = N1+def(C), C = N2+def(C),

memdw(N2+def(C)) = def(B)memdw(N1+def(C)) = def(B)memdw(N3+def(C)) = def(memdw(def(C)))

where A, B, C are ‘registers’N1, N2, N3 are ‘Int’

Juice

• Inductive GeneralizationReplace registers and constants by variables

7/22/2015 (C) 2015 U. Louisiana at Lafayette

RValue = Int+ def(RValue)+ RValue op Rvalue+ op RValue+ Variable

Page 36: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Challenge 3: Scalable Search

7/22/2015 (C) 2015 U. Louisiana at Lafayette 36

Page 37: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Featurization Process

37

Unpack Disassembly

Procedure

Procedure

Procedure

Hash

Hash

Hash

MinHash

MinHash

Feature VectorBinary

Binary

Compiler Attributes

7/22/2015 (C) 2015 U. Louisiana at Lafayette

Page 38: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

MinHash: A form of LSH

7/22/2015 (C) 2015 U. Louisiana at Lafayette 38

A Feature B

Light Brown Hair Color Dark Brown

Long Hair Length Long

Brown Eye Color Brown

MinHash Signature-1 MinHash Signature-2

Feature = MinHash FunctionSet of Features = MinHash Signature

Compose for Deterministic manipulations

Page 39: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Architecture

7/22/2015 (C) 2015 U. Louisiana at Lafayette 39

Page 40: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

VirusBattle Webservice Architecture

7/22/2015 (C) 2015 U. Louisiana at Lafayette 40

Page 41: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Empirical Results

7/22/2015 (C) 2015 U. Louisiana at Lafayette 41

Page 42: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Dataset

7/22/2015 (C) 2015 U. Louisiana at Lafayette 42

0

500

1000

1500

2000

2500

3000

unpacked

original

Bots harvested in 2013

Page 43: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

“Interesting” Procedures

7/22/2015 (C) 2015 U. Louisiana at Lafayette 43

Page 44: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

7/22/2015 (C) 2015 U. Louisiana at Lafayette 44

Libraries ID’ed by IDA

Page 45: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Transitive Library via Semantics

7/22/2015 (C) 2015 U. Louisiana at Lafayette 45

Page 46: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

RE Cost Reduction

1.7 Million+ procedures

32K+ semantically unique procedures

7/22/2015 (C) 2015 U. Louisiana at Lafayette 46

# of semantically unique procedures

# of procedures in binaries

# of IDA unique procedures105K+ procedures

Procedures All IDA Unique

JuiceUnique

Lib Procs 65,113 11,482 4,382

Non LibProcs

1,644,355 96.2%

93,916 89.1%

27,859 86.4%

Total 1,709,468 105,398 32,241

Page 47: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Intelligence: Connecting Families

Aldibot

Ponyloader

Smokeloader

Darkcomet

Semantically similar binaries between malware families

7/22/2015 (C) 2015 U. Louisiana at Lafayette 47

Page 48: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Intelligence: Code Sharing

7/22/2015 (C) 2015 U. Louisiana at Lafayette 48

Optima DarkComet

Shared in 13/159 samplesShared in 65/319 samples

Non-Lib Unpacked Procedure

Page 49: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Intelligence: Code Evolution

7/22/2015 (C) 2015 U. Louisiana at Lafayette 49

ddoser: 10 samples#

Pro

ced

ure

s Sh

ared

Percent binaries

200 in 100%

100 in 10%

Page 50: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Intelligence: Needle in Haystack

7/22/2015 (C) 2015 U. Louisiana at Lafayette 50

darkcomet: 649 samples

20 in 60-65%

930 in 0-5%

# P

roce

du

res

Shar

ed

Percent binaries

Page 51: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Performance

95% percentileSemantic analysis: < 15sUnpacking: < 30sMalware Search: < 7sProcedure Search: < 100ms

Distribution of analysis time

7/22/2015 (C) 2015 U. Louisiana at Lafayette 51

Page 52: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

VirusBattle: In a nutshell

Key Innovations

• Automated unpacking using VM introspection

• Semantic fingerprints, as against bits-based fingerprints

• Innovative 2-tier search algorithm for fast searches

• Search at various granularity:Whole binary, procedures, blocks, strings

• Interfaces with Palantir’s Forensic Investigation platform

Rapidly extract intelligence from malwareApplication

Impact• Order of magnitude improvement in malware

analysts capabilityUnpacking time:

Reduced from days/weeks to minutes

Analysis work: Reduce efforts from weeks/months to minutes

New capability: Build knowledge base of analysis indexed on similar codeShare analysts’ experience across malware families

Component Time(*) Accuracy

Unpacker 30 sec 97%

Semantic Juice 15 sec

Binary search 7 sec 95%

Procedure search 100ms

Performance

* Based on analysis of 2,500 botnets binaries; ** Max time to process 95% of files

7/22/2015 (C) 2015 U. Louisiana at Lafayette 52

Page 53: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Blackhat Sound Bytes

• Malware repositories are great source of intelligence

• Semantic juice peers through code obfuscation

• Semantic hashing enables fast search over large repositories

• VM Introspection gives you X-Ray vision over malware

• VirusBattle.com: Malware Intelligence Mining in the Cloud

7/22/2015 (C) 2015 U. Louisiana at Lafayette 53

Page 54: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Contact

Prof. Arun Lakhotia

Vivek Notani

University of Louisiana at Lafayette, Lafayette, LA, USA

[email protected]

[email protected]

7/22/2015 (C) 2015 U. Louisiana at Lafayette 54

Page 55: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

Extras

7/22/2015 (C) 2015 U. Louisiana at Lafayette 55

Page 56: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

MinHash: A form of LSH

• Consider Set A and Set B

• Let h(x)->int be a function that takes a member of A or B and gives an integer

• Let hmin (s) represent minimum member of set s w.r.t. h.

• Then,

Pr(hmin(A)=hmin(B)) =

Problem: High Variance!

7/22/2015 (C) 2015 U. Louisiana at Lafayette 56

Page 57: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University

MinHash Signatures:

• Compose d minhash functions:• Signature Match then implies each of the d functions agree on match

• Pr (sig(A)=sig(B)) = J(A,B)d

Problem: Too many False Negatives!

• Check r minhash signatures:• A Match then implies atleast one of the r signatures agree on match

• Pr (match(A,B)) = 1 - (1 - J(A,B)d )r

7/22/2015 (C) 2015 U. Louisiana at Lafayette 57