Upload
kenny
View
33
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Finding Diversity in Remote Code Injection Exploits. Justin Ma , John Dunagan , Helen J. Wang , Stefan Savage , Geoffrey M. Voelker *University of California, San Diego *Microsoft Research. Encountering new malware. Have I seen this before? - PowerPoint PPT Presentation
Citation preview
Finding Diversity in Remote Code Injection Exploits
Justin Ma, John Dunagan, Helen J. Wang,Stefan Savage, Geoffrey M. Voelker
*University of California, San Diego*Microsoft Research
2
Encountering new malware
Have I seen this before?
How closely related is it to what I have seen before?
3
Practical considerations
?
New defense?
4
Theoretical considerations
?
?
Evolutionary relationship?
5
Grouping similar malware together…
• Ultimately, construct malware families
• Anti-virus industry is active in this area
6
Motivation
710 new families40,000 new variants
Family and variant defined in ad-hoc fashion…
Is there a systematic way to determine the nature of this diversity?
7
Exploit diversity
Attacker
MS RPC Request Exploit
8
Polymorphism
Attacker
Encrypted
9
Behind the encryption…
Attacker
10
Differing constants
Attacker
Different IP address
11
Functional differences
Attacker
Waiting for a connection
12
Different code base
Attacker
Calling “tftp.exe”
13
ISystemActivator vulnerability
1,561 exploit attempts How different are they?
90 unique payloads
14
Our goal
• Automatically construct phylogeny, or family tree of exploits
15
Outline for this talk
• On classifying shellcodes
• Steps for systematically studying shellcodes– Trace collection– Shellcode extraction– Shellcode decryption– Comparing samples– Cluster analysis
• Post-hoc manual inspection to validate– Look at the code!
16
Why shellcodes?
• Our study focuses on exploits
• They are packaged with the exploit– First foreign code that executes on a newly
infected machine– Part of exploit with most leeway for variation
• Primary challenge: collecting and analyzing shellcodes
17
Remote code injection attacks
Victim
Victim’s stack memory
high
lowMS RPCRequestExploit
Shellcode
Flow of execution
Decryptedshellcode
Vulnerablebuffer
18
Trace collection
• Studying 5 vulnerabilities
• Residential– 2-day trace– Windows XP SP2– 29 unused DSL IP addresses– 4,400 exploit samples
• Enterprise Trace– 1 Hour– Active responders– 5x /24 subnets– 1,500 exploit samples
19
Shellcode extraction
• Shield (Sigcomm’04)– Framework for specifying network-based
protocols and vulnerabilities
– Extracts shellcodes from raw network packets
20
Shellcode decryption
• Shellcode is encrypted– Use shellcode’s own decryption loop!
• Limited emulation– Similar to generic decryption technique used
for viruses
21
Comparing samples:Candidate metrics
• Edit distance– Too specific: non-code portions of payload
made related exploits unnecessarily distant
• Structural distance– Control flow graph over basic blocks– Basic blocks summarized with a color/hash– Too general: did not capture subtle instruction
variations between exploit families
22
Comparing samples:Final metric
• Exedit distance metric– Edit distance over executed parts of shellcode
• Distinguishes code from data• Maintains instruction-level details
Canonical string for shellcode
23
Cluster analysis
• Need to group samples using the exedit distance metric
• Agglomerative clustering– Each iteration, merge closest pair of clusters– Cluster distance = distance of furthest
samples between two clusters
24
Results
• Caught exploits for 5 vulnerabilities over traces• Summary for residential trace
Exploits Unique exploits
Families
SQL Resolution 767 2 1
LSASS 1,769 56 5
ISystemActivator 1,561 90 6
RemoteActivation 338 338 2
25
ISystemActivator
10% clustering threshold
Need to manually verify this…
6 families
26
ISystemActivator
4-byte decoding key
Kernel-address loading function
Function-finding block
27
ISystemActivator
4-byte decoding key
Kernel-address loading function
Function-finding block
4-byte encoding key
Kernel base loader
Function finder
28
ISystemActivator
Longest payload
Many function blocks in middle of payload
29
ISystemActivator
Command-line call to “tftp.exe”
30
ISystemActivator
Different instructions in parts, otherwise very similar
31
ISystemActivator
“Bind” version“Connect-back” version
32
Conclusions
• Systematic method for classifying exploits– Exploit collection– Shellcode extraction and decryption– Shellcode comparison using exedit distance– Group exploits with clustering
• Similarity between samples in computed phylogenies corresponded well with observed differences
• Useful step toward automating malware classification