Upload
others
View
27
Download
0
Embed Size (px)
Citation preview
Compiler-Agnostic Function Detection in Binaries
Dennis Andriesse†, Asia Slowinska, Herbert Bos†
†Vrije Universiteit Amsterdam
EuroS&P 2017
Introduction
Disassembly in Systems Security
Disassembly is the backbone of all binary-level systems security work(and more)
• Control-Flow Integrity
• Automatic Vulnerability/Bug Search
• Lifting binaries to LLVM/IR (e.g., for reoptimization)
• Malware Analysis
• Binary Hardening
• Binary Instrumentation
• . . .
Compiler-Agnostic Function Detection in Binaries 1 of 19
Introduction
Results from Previous Work
Function detection currently the main disassembly challenge
• Even function start detection yields many FPs/FNs (20%+)
• Complex cases: non-standard prologues, tailcalls, inlining, . . .
• Binary analysis commonly requires function information
0
20
40
60
80
100
O0 O1 O2 O3
% c
orre
ct (g
eom
etric
mea
n)
gcc-5.1.1 x86
angr 4.6.1.4BAP 0.9.9
ByteWeight 0.9.9Dyninst 9.1.0
Hopper 3.11.5IDA Pro 6.7
Jakstab 0.8.4
SPEC (C)SPEC (C++)
O0 O1 O2 O3
gcc-5.1.1 x64
O0 O1 O2 O3
clang-3.7.0 x86
O0 O1 O2 O3
clang-3.7.0 x64
O0 O1 O2 O3
Visual Studio '15 x86
O0 O1 O2 O3
Visual Studio '15 x64
Figure: Correctly detected function start addresses
Compiler-Agnostic Function Detection in Binaries 2 of 19
Introduction
Function Detection: False Negative
Listing: False negative indirectly called function for IDA Pro 6.7 (gcccompiled with gcc at O3 for x64 ELF)
6caf10 <ix86 fp compare mode>:
6caf10: mov 0x3f0dde(%rip),%eax
6caf16: and $0x10,%eax
6caf19: cmp $0x1,%eax
6caf1c: sbb %eax,%eax
6caf1e: add $0x3a,%eax
6caf21: retq
Compiler-Agnostic Function Detection in Binaries 3 of 19
Introduction
Function Detection: False Positive
Listing: False positive function (shaded) for Dyninst (perlbenchcompiled with gcc at O3 for x64 ELF)
46b990 <Perl pp enterloop>:
[...]
46ba02: ja 46bb50 <Perl pp enterloop+0x1c0>
46ba08: mov %rsi,%rdi
46ba0b: shl %cl,%rdi
46ba0e: mov %rdi,%rcx
46ba11: and $0x46,%ecx
46ba14: je 46bb50 <Perl pp enterloop+0x1c0>
[...]
46bb47: pop %r12
46bb49: retq
46bb4a: nopw 0x0(%rax,%rax,1)
46bb50: sub $0x90,%rax
Compiler-Agnostic Function Detection in Binaries 4 of 19
Current Approaches
Signature-Based Function Detection
• Most current approaches scan for prologue/epilogue signatures• IDA Pro, Dyninst, ByteWeight (Bao et al. 2014), (Shin et al.
2015)
• Error-prone: sigs may be missing/optimized away
• Non-scalable: new sigs needed for every compilerversion/platform
• Even machine learning approaches need continuous retraining
Compiler-Agnostic Function Detection in Binaries 5 of 19
Overview of Our Approach
Compiler-Agnostic Function Detection
• We propose a signature-less approach based on structuralanalysis of the Control-Flow Graph (CFG)
• Basic premise: Weakly Connected Components Analysis
• Compiler-agnostic: no training/maintenance needed
• Able to detect all basic blocks of a function
• Inherent support for corner cases such as non-contiguousfunctions
Compiler-Agnostic Function Detection in Binaries 6 of 19
Overview of Our Approach
call
1
1© Disassemble binary and generate interprocedural CFG (lineardisassembly + switch/inline data detection)
Compiler-Agnostic Function Detection in Binaries 7 of 19
Overview of Our Approach
2
2© Hide edges e ∈ Ecall
Compiler-Agnostic Function Detection in Binaries 8 of 19
Overview of Our Approach
3
f1
f2
f3
3© Locate directly called entry points and expand functions byfollowing control flow (ignoring direction)
Compiler-Agnostic Function Detection in Binaries 9 of 19
Overview of Our Approach
4
f1
f2
f3
f4
4© Find remaining functions using Connected Components Analysis,analyze control-flow to find entry points
Compiler-Agnostic Function Detection in Binaries 10 of 19
Evaluation
0.0
0.2
0.4
0.6
0.8
1.0
O0 O1 O2 O3
f-sco
re
gcc-5.1.1 x86
NucleusDyninst 9.1.0
BAP/ByteWeight 0.9.9IDA Pro 6.7
C
C++
O0 O1 O2 O3
gcc-5.1.1 x64
O0 O1 O2 O3
clang-3.7.0 x86
O0 O1 O2 O3
clang-3.7.0 x64
O0 O1 O2 O3
Visual Studio '15 x86
O0 O1 O2 O3
Visual Studio '15 x64
Function Start Detection
• Overall average F-score of 0.96 for SPEC CPU 2006 (similar forservers)
• Stable performance across compiler/platform/optimization level
• Main improvement over others: higher recall (fewer FNs)
Compiler-Agnostic Function Detection in Binaries 11 of 19
Evaluation
0.0
0.2
0.4
0.6
0.8
1.0
O0 O1 O2 O3
f-sco
re
gcc-5.1.1 x86
NucleusDyninst 9.1.0
BAP/ByteWeight 0.9.9IDA Pro 6.7
C
C++
O0 O1 O2 O3
gcc-5.1.1 x64
O0 O1 O2 O3
clang-3.7.0 x86
O0 O1 O2 O3
clang-3.7.0 x64
O0 O1 O2 O3
Visual Studio '15 x86
O0 O1 O2 O3
Visual Studio '15 x64
Function Boundary Detection
• Overall average F-score of 0.90 for SPEC CPU 2006
• Even better for C-only server tests (average F-score 0.97)
• Again, more stable than other approaches
• Best alternative: IDA Pro, average F-score of 0.84
Compiler-Agnostic Function Detection in Binaries 12 of 19
Evaluation
More Results
• In-depth analysis of results (including FPs/FNs) in paper
• Most complex cases handled correctly (non-contiguous functions,multi-entry functions, . . . )
• Main problematic case: tail calls
Compiler-Agnostic Function Detection in Binaries 13 of 19
Evaluation
0
20
40
60
80
100
120
140
160
1000 10000 100000 1x106
runt
ime
(s)
# instructions
NucleusDyninst 9.1.0
IDA Pro 6.7BAP/ByteWeight 0.9.9
Runtime• On par with fastest alternatives
Compiler-Agnostic Function Detection in Binaries 14 of 19
Applicability to Malware Analysis
Resistance to Obfuscation• Although this talk is in the Malware session, we do not explicitly
target malware
• That said, our approach is agnostic of some basic obfuscationapproaches
• Instruction-level polymorphism• Mangling of function prologues/epilogues• Some control flow obfuscations (e.g., converting direct calls to
indirect, branching functions, . . . )
• But we make no promises for arbitrary obfuscations!
Compiler-Agnostic Function Detection in Binaries 15 of 19
Issues with Evaluation of Machine Learning Approaches
Performance Discrepancies
• During our evaluation, noticed far lower performance forByteWeight than previously reported (Bao et al. 2014)
• Mean F-score 0.32 points lower than expected
• Observation persists for gcc (v4.7–v5.1), clang, and VisualStudio
• Upon closer inspection, discovered issues with test suite used toevaluate all major machine learning-based function detectionwork (Bao et al. 2014 and Shin et al. 2015)
Compiler-Agnostic Function Detection in Binaries 16 of 19
Issues with Evaluation of Machine Learning Approaches
Test Suite Issues• Both Bao et al. and Shin et al. use ten-fold cross-validation to
evaluate their work
• Partition test suite into training set (BT , 90% of binaries) andevaluation set (BE)
• Repeat ten times such that each binary is in BE exactly once
• Crucial to ensure sufficient variation in test suite to preventoverfitting!
Compiler-Agnostic Function Detection in Binaries 17 of 19
Issues with Evaluation of Machine Learning Approaches
Test Suite Issues• Linux test suite used by Bao et al. and Shin et al. consists ofcoreutils (106 binaries), binutils (16 binaries), andfindutils (7 binaries)
• Average coreutils binary shares 54% of its functions with allother coreutils binaries
• Average coreutils binary shares 94% of its functions with atleast one other coreutils binary
• For the average coreutils binary in BE , at least 86% of itsfunctions are expected to occur in BT
• Large degree of overfitting in evaluation of machinelearning approaches, re-evaluation needed
Compiler-Agnostic Function Detection in Binaries 18 of 19
Conclusion
• We introduced a novel compiler-agnostic function detector
• No maintenance/learning phase required
• More accurate results than existing approaches
• Inherent support for complex cases
• Available open source:https://www.vusec.net/projects/function-detection/
• Features export to IDA Pro → easy to use in real-world setting
Compiler-Agnostic Function Detection in Binaries 19 of 19