Upload
silvio-cesare
View
1.300
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Silvio CesareDeakin University
Who am I and where did this talk come from?
PhD student at Deakin University.
Research focus includes malware detection and automated vulnerability detection.
Software similarity is the focus of this talk.
This talk is an overview of the core topics, how its approached in academia, and a web service that identifies software similarity.
Introduction Many applications of software similarity and
classification
Malware Detection
Software Theft Detection
Plagiarism Detection
Software Clone Detection
Problem Formulation Extract features, fingerprints, or
'birthmarks' from programs p and q.
If birthmark(p) similar to birthmark(q), then programs are similar.
Software Similarity Problem
Taxonomy of Program Features Raw Code Abstract Syntax Trees Variables Pointers Instructions Basic Blocks Procedures API Calls Control Flow Graphs Call Graphs Data Flow Procedure Dependency Graphs System Dependency Graphs Object Inheritance and Dependency
Program Features ExamplesAST (left) and Control Flow (right)
if
== return =
x 0 x 1
condition then else
movl $0x4020a0,(%esp)call 4011b8 <_puts>addl $0x1,-0x8(%ebp)
lea 0x4(%esp),%ecxand $0xfffffff0,%esppushl -0x4(%ecx)push %ebpmov %esp,%ebppush %ecxsub $0x24,%espcall 4011b0 <___main>movl $0x0,-0x8(%ebp)jmp 40115f <_main+0x2f>
add $0x24,%esppop %ecxpop %ebplea -0x4(%ecx),%espret
cmpl $0x9,-0x8(%ebp)jle 40114f <_main+0x1f>
Proc_0
Proc_2
Proc_1
Proc_4
Proc_3
Taxonomy of Features in Program Binaries
Headers
Object Code
Symbols
Debugging Information
Relocations
Dynamic Linking Information
Program Transformations Compiler Optimisation and Recompilation
Program Obfuscation
Plagiarism, Software Theft, and Derivative Works
Malware packing, polymorphism and metamorphism
Traditional Malware Packing
Restoration Routine
Hidden Code = f(Original Code)
Original Code
Remnant Restoration
Routine
Original Code = g(Hidden Code)
Packing Runtime
Original Executable Packed Executable Memory Image at Runtime
Processing Program Features Treat features or birthmark as a
mathematical object. Strings Vectors Sets Sets of Vectors Trees Graphs
Software Birthmark Similarity Strings
Edit distance etc
Vectors Cosine Similarity Euclidean distance etc
Set Similarity Jaccard distance etc
Set of Vectors Similarity Minimum matching distance
Trees and Graphs Edit distances etc
Software Indexing and Searching Nearest neighbour is closest program in
database to query.
Based on 'distance' – a measure of dissimilarity between objects.
Distances that are 'metric' can index and search more efficiently.
rNN (Range Nearest Neighbour)
q
Query Malicious
Query Benign
distance(p,q)
p
r
Malware
Query
Wiki on Software Similarity and ClassificationBook on Software Similarity and ClassificationSimseer – A Software Similarity Web Service
Wiki on Software Similarity and ClassificationReviews of academic papers.
http://www.foocodechu.com/wiki
Book on ‘Software Similarity and Classification’Academic style survey of the topic.
Published by Springer.
100 pages.
Available in April.
http://www.springer.com/computer/security+and+cryptology/book/978-1-4471-2908-0
Simseer – A Software Similarity Web ServiceAn online service to identify similarity between
programs.
Performs unpacking.
Renders an evolutionary tree to show program relationships.
Free to use!
http://www.foocodechu.com/?q=simseer-a-software-similarity-web-service
Conclusion Presented a review of software similarity.
Demonstrated a new web service.
Try it!
http://www.foocodechu.com
Questions?