20
Compiler-Agnostic Function Detection in Binaries Dennis Andriesse , Asia Slowinska, Herbert Bos Vrije Universiteit Amsterdam EuroS&P 2017

Compiler-Agnostic Function Detection in Binaries · Compiler-Agnostic Function Detection in Binaries Dennis Andriessey, Asia Slowinska, Herbert Bosy ... Compiler-Agnostic Function

  • Upload
    others

  • View
    27

  • Download
    0

Embed Size (px)

Citation preview

Compiler-Agnostic Function Detection in Binaries

Dennis Andriesse†, Asia Slowinska, Herbert Bos†

†Vrije Universiteit Amsterdam

EuroS&P 2017

Introduction

Disassembly in Systems Security

Disassembly is the backbone of all binary-level systems security work(and more)

• Control-Flow Integrity

• Automatic Vulnerability/Bug Search

• Lifting binaries to LLVM/IR (e.g., for reoptimization)

• Malware Analysis

• Binary Hardening

• Binary Instrumentation

• . . .

Compiler-Agnostic Function Detection in Binaries 1 of 19

Introduction

Results from Previous Work

Function detection currently the main disassembly challenge

• Even function start detection yields many FPs/FNs (20%+)

• Complex cases: non-standard prologues, tailcalls, inlining, . . .

• Binary analysis commonly requires function information

0

20

40

60

80

100

O0 O1 O2 O3

% c

orre

ct (g

eom

etric

mea

n)

gcc-5.1.1 x86

angr 4.6.1.4BAP 0.9.9

ByteWeight 0.9.9Dyninst 9.1.0

Hopper 3.11.5IDA Pro 6.7

Jakstab 0.8.4

SPEC (C)SPEC (C++)

O0 O1 O2 O3

gcc-5.1.1 x64

O0 O1 O2 O3

clang-3.7.0 x86

O0 O1 O2 O3

clang-3.7.0 x64

O0 O1 O2 O3

Visual Studio '15 x86

O0 O1 O2 O3

Visual Studio '15 x64

Figure: Correctly detected function start addresses

Compiler-Agnostic Function Detection in Binaries 2 of 19

Introduction

Function Detection: False Negative

Listing: False negative indirectly called function for IDA Pro 6.7 (gcccompiled with gcc at O3 for x64 ELF)

6caf10 <ix86 fp compare mode>:

6caf10: mov 0x3f0dde(%rip),%eax

6caf16: and $0x10,%eax

6caf19: cmp $0x1,%eax

6caf1c: sbb %eax,%eax

6caf1e: add $0x3a,%eax

6caf21: retq

Compiler-Agnostic Function Detection in Binaries 3 of 19

Introduction

Function Detection: False Positive

Listing: False positive function (shaded) for Dyninst (perlbenchcompiled with gcc at O3 for x64 ELF)

46b990 <Perl pp enterloop>:

[...]

46ba02: ja 46bb50 <Perl pp enterloop+0x1c0>

46ba08: mov %rsi,%rdi

46ba0b: shl %cl,%rdi

46ba0e: mov %rdi,%rcx

46ba11: and $0x46,%ecx

46ba14: je 46bb50 <Perl pp enterloop+0x1c0>

[...]

46bb47: pop %r12

46bb49: retq

46bb4a: nopw 0x0(%rax,%rax,1)

46bb50: sub $0x90,%rax

Compiler-Agnostic Function Detection in Binaries 4 of 19

Current Approaches

Signature-Based Function Detection

• Most current approaches scan for prologue/epilogue signatures• IDA Pro, Dyninst, ByteWeight (Bao et al. 2014), (Shin et al.

2015)

• Error-prone: sigs may be missing/optimized away

• Non-scalable: new sigs needed for every compilerversion/platform

• Even machine learning approaches need continuous retraining

Compiler-Agnostic Function Detection in Binaries 5 of 19

Overview of Our Approach

Compiler-Agnostic Function Detection

• We propose a signature-less approach based on structuralanalysis of the Control-Flow Graph (CFG)

• Basic premise: Weakly Connected Components Analysis

• Compiler-agnostic: no training/maintenance needed

• Able to detect all basic blocks of a function

• Inherent support for corner cases such as non-contiguousfunctions

Compiler-Agnostic Function Detection in Binaries 6 of 19

Overview of Our Approach

call

1

1© Disassemble binary and generate interprocedural CFG (lineardisassembly + switch/inline data detection)

Compiler-Agnostic Function Detection in Binaries 7 of 19

Overview of Our Approach

2

2© Hide edges e ∈ Ecall

Compiler-Agnostic Function Detection in Binaries 8 of 19

Overview of Our Approach

3

f1

f2

f3

3© Locate directly called entry points and expand functions byfollowing control flow (ignoring direction)

Compiler-Agnostic Function Detection in Binaries 9 of 19

Overview of Our Approach

4

f1

f2

f3

f4

4© Find remaining functions using Connected Components Analysis,analyze control-flow to find entry points

Compiler-Agnostic Function Detection in Binaries 10 of 19

Evaluation

0.0

0.2

0.4

0.6

0.8

1.0

O0 O1 O2 O3

f-sco

re

gcc-5.1.1 x86

NucleusDyninst 9.1.0

BAP/ByteWeight 0.9.9IDA Pro 6.7

C

C++

O0 O1 O2 O3

gcc-5.1.1 x64

O0 O1 O2 O3

clang-3.7.0 x86

O0 O1 O2 O3

clang-3.7.0 x64

O0 O1 O2 O3

Visual Studio '15 x86

O0 O1 O2 O3

Visual Studio '15 x64

Function Start Detection

• Overall average F-score of 0.96 for SPEC CPU 2006 (similar forservers)

• Stable performance across compiler/platform/optimization level

• Main improvement over others: higher recall (fewer FNs)

Compiler-Agnostic Function Detection in Binaries 11 of 19

Evaluation

0.0

0.2

0.4

0.6

0.8

1.0

O0 O1 O2 O3

f-sco

re

gcc-5.1.1 x86

NucleusDyninst 9.1.0

BAP/ByteWeight 0.9.9IDA Pro 6.7

C

C++

O0 O1 O2 O3

gcc-5.1.1 x64

O0 O1 O2 O3

clang-3.7.0 x86

O0 O1 O2 O3

clang-3.7.0 x64

O0 O1 O2 O3

Visual Studio '15 x86

O0 O1 O2 O3

Visual Studio '15 x64

Function Boundary Detection

• Overall average F-score of 0.90 for SPEC CPU 2006

• Even better for C-only server tests (average F-score 0.97)

• Again, more stable than other approaches

• Best alternative: IDA Pro, average F-score of 0.84

Compiler-Agnostic Function Detection in Binaries 12 of 19

Evaluation

More Results

• In-depth analysis of results (including FPs/FNs) in paper

• Most complex cases handled correctly (non-contiguous functions,multi-entry functions, . . . )

• Main problematic case: tail calls

Compiler-Agnostic Function Detection in Binaries 13 of 19

Evaluation

0

20

40

60

80

100

120

140

160

1000 10000 100000 1x106

runt

ime

(s)

# instructions

NucleusDyninst 9.1.0

IDA Pro 6.7BAP/ByteWeight 0.9.9

Runtime• On par with fastest alternatives

Compiler-Agnostic Function Detection in Binaries 14 of 19

Applicability to Malware Analysis

Resistance to Obfuscation• Although this talk is in the Malware session, we do not explicitly

target malware

• That said, our approach is agnostic of some basic obfuscationapproaches

• Instruction-level polymorphism• Mangling of function prologues/epilogues• Some control flow obfuscations (e.g., converting direct calls to

indirect, branching functions, . . . )

• But we make no promises for arbitrary obfuscations!

Compiler-Agnostic Function Detection in Binaries 15 of 19

Issues with Evaluation of Machine Learning Approaches

Performance Discrepancies

• During our evaluation, noticed far lower performance forByteWeight than previously reported (Bao et al. 2014)

• Mean F-score 0.32 points lower than expected

• Observation persists for gcc (v4.7–v5.1), clang, and VisualStudio

• Upon closer inspection, discovered issues with test suite used toevaluate all major machine learning-based function detectionwork (Bao et al. 2014 and Shin et al. 2015)

Compiler-Agnostic Function Detection in Binaries 16 of 19

Issues with Evaluation of Machine Learning Approaches

Test Suite Issues• Both Bao et al. and Shin et al. use ten-fold cross-validation to

evaluate their work

• Partition test suite into training set (BT , 90% of binaries) andevaluation set (BE)

• Repeat ten times such that each binary is in BE exactly once

• Crucial to ensure sufficient variation in test suite to preventoverfitting!

Compiler-Agnostic Function Detection in Binaries 17 of 19

Issues with Evaluation of Machine Learning Approaches

Test Suite Issues• Linux test suite used by Bao et al. and Shin et al. consists ofcoreutils (106 binaries), binutils (16 binaries), andfindutils (7 binaries)

• Average coreutils binary shares 54% of its functions with allother coreutils binaries

• Average coreutils binary shares 94% of its functions with atleast one other coreutils binary

• For the average coreutils binary in BE , at least 86% of itsfunctions are expected to occur in BT

• Large degree of overfitting in evaluation of machinelearning approaches, re-evaluation needed

Compiler-Agnostic Function Detection in Binaries 18 of 19

Conclusion

• We introduced a novel compiler-agnostic function detector

• No maintenance/learning phase required

• More accurate results than existing approaches

• Inherent support for complex cases

• Available open source:https://www.vusec.net/projects/function-detection/

• Features export to IDA Pro → easy to use in real-world setting

Compiler-Agnostic Function Detection in Binaries 19 of 19