27
Embracing the new threat: towards automatically, self-diversifying malware Mathias Payer <[email protected] > UC Berkeley and (soon) Purdue University Image (c) http://ucrtoday.ucr.edu/9768/assassin-bugs

Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Embracing the new threat: towards automatically, self-diversifying malware

Mathias Payer <[email protected]>UC Berkeley and (soon) Purdue University

Image (c) http://ucrtoday.ucr.edu/9768/assassin-bugs

Page 2: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Malware landscape is changing

Image (c) Wikimedia

Page 3: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

The ongoing malware arms race

Generate newmalwareinstance

Attack a bunchof targets

AV vendorgets first sample

Malwareanalysis

Signaturesupdated

Page 4: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Defense limitations

● Newly diversified samples are not detected– Basically a “new” attack

● New malware spreads fast– Time lag between analysis and updated signatures

● Can we automate this process?

Page 5: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Fully automatic diversity

*.cpp Compiler

Malware Malware Malware

?

Page 6: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Outline

State of the art:Malware detection

A new threat:Malware diversification

Possible mitigation:Better security practices

Page 7: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

State of the art: Malware detection

Image (c) Wikimedia

Page 8: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Malware detection is limited

● Performance– Don't slow down a user's machine (too much)

● Precision– Behavioral, generic matching

● Latency– Time lag between spread and protection

Page 9: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Detection mechanisms

Image (c) Wikimedia

Page 10: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Signature-based detection

● Compare against database of known-bad– Extract pattern

– Match sequence of bytes or regular expression

● Advantages– Fast

– Low false positive rate

● Disadvantages– Precision limited to known-bad samples

Page 11: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Static analysis-based detection

● Search potentially bad patterns– API calls

– System calls

● Advantages– Low overhead

● Disadvantages– False positives

– Based on well-known heuristics

Page 12: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Behavioral-based detection

● Execute “file” in a virtual machine– Detect modifications

● Advantages– Most precise

● Disadvantages– High overhead

– Precision limited due to emulation detection

Page 13: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Summary: Malware protection

● Arms race due to manual diversification– Signature-based techniques loose effectiveness

● Cope with limited resources– On the target machine, for the analysis, and to push

new signatures/heuristics

● No perfect solution– Either false positives and/or negatives or huge

performance impact

Page 14: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

New threat: Malware diversification

Image (c) Wikimedia

Page 15: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Software diversification

*.cpp Compiler

Program Program Program

?

Page 16: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

C/C++ liberties

● Data layout changes– Data structure layout on stack

– Layout for heap objects (limited for structs)

● Code changes– Register allocation (shuffle or starve)

– Instruction selection

– Basic block splitting, merging, shuffling

Page 17: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Malware diversification

● Generate unique binaries– Minimize common substrings (code or data)

– Performance overhead not an issue

● Diversify code and data layout● Diversify static data as well

Page 18: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Implementation

● Prototype built on LLVM 3.4– Small changes in code generator, code layouter,

register allocator, stack frame layouter, some data obfuscation passes

● Input: LLVM bitcode● Output: diversified binary

● Source: http://github.com/gannimo/MalDiv

Page 19: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Similarity limitations

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

3738

3940

4142

4344

4546

4748

4950

5152

5354

5556

57

1

10

100

1000

10000

100000

1000000Common subsequences in diversified binaries

400.perlbench401.bzip2429.mcf433.milc444.namd445.gobmk450.soplex453.povray456.hmmer458.sjeng462.libquantum464.h264ref470.lbm471.omnetpp473.astar482.sphinxperlbench vs. bzip2perlbench vs. gobmksoplex vs. omnetppnmapsimple port scanner

Lenght of subsequence

Num

ber

of s

ubse

quen

ces

(log

sca

le)

Page 20: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Demo

● Simple hello world– Let's see how far we can push this!

#include <stdio.h, string.h>

const char foo[] = "foobar";char bar[7];

int main(int argc, char* argv[]) {strcpy(bar, "barfoo");printf("Hello World %s %s\n", foo, bar);printf("Arguments: %d, executable: %s\n",

argc, argv[0]);return 0;

}

Page 21: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Scenario 1: malware generator

*.cpp Compiler

Malware Malware Malware

?

Page 22: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Scenario 2: self-diversifying MW

LLVM Opt

Malware bc

LLVM bc

Malware

LLVM Opt*

Malware*

Malware bc*

LLVM bc*

LLVM Opt*

Malware*

Malware bc*

LLVM bc*

LLVM Opt*

Malware*

Malware bc*

LLVM bc*

Page 23: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Possible mitigation:Better security practices

Image (c) Wikimedia

Page 24: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Mitigation

● Recover high-level semantics from code– Hard (and results in an arms race)

● Full behavioral analysis– Harder

● Prohibit initial intrusion– Fix broken software & educate users

– Hardest

Page 25: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Conclusion

Image (c) Wikimedia

Page 26: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Conclusion

● Diversity evades malware detection– Fully automatic, built into compiler

– No need for packers anymore

● Adopts to new similarity metrics● New arms race between defenders and

compiler writers● Don't rely on simple, static similarity!

Page 27: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection

Questions?

Mathias Payer <[email protected]>Project: https://github.com/gannimo/MalDiv Homepage: https://nebelwelt.net