Finding Diversity in Remote Code Injection Exploits

Finding Diversity in Remote Code Injection Exploits

Justin Ma, John Dunagan, Helen J. Wang,Stefan Savage, Geoffrey M. Voelker

*University of California, San Diego*Microsoft Research

2

Encountering new malware

Have I seen this before?

How closely related is it to what I have seen before?

3

Practical considerations

?

New defense?

4

Theoretical considerations

?

?

Evolutionary relationship?

5

Grouping similar malware together…

• Ultimately, construct malware families

• Anti-virus industry is active in this area

6

Motivation

710 new families40,000 new variants

Family and variant defined in ad-hoc fashion…

Is there a systematic way to determine the nature of this diversity?

7

Exploit diversity

Attacker

MS RPC Request Exploit

8

Polymorphism

Attacker

Encrypted

9

Behind the encryption…

Attacker

10

Differing constants

Attacker

Different IP address

11

Functional differences

Attacker

Waiting for a connection

12

Different code base

Attacker

Calling “tftp.exe”

13

ISystemActivator vulnerability

1,561 exploit attempts How different are they?

90 unique payloads

14

Our goal

• Automatically construct phylogeny, or family tree of exploits

15

Outline for this talk

• On classifying shellcodes

• Steps for systematically studying shellcodes– Trace collection– Shellcode extraction– Shellcode decryption– Comparing samples– Cluster analysis

• Post-hoc manual inspection to validate– Look at the code!

16

Why shellcodes?

• Our study focuses on exploits

• They are packaged with the exploit– First foreign code that executes on a newly

infected machine– Part of exploit with most leeway for variation

• Primary challenge: collecting and analyzing shellcodes

17

Remote code injection attacks

Victim

Victim’s stack memory

high

lowMS RPCRequestExploit

Shellcode

Flow of execution

Decryptedshellcode

Vulnerablebuffer

18

Trace collection

• Studying 5 vulnerabilities

• Residential– 2-day trace– Windows XP SP2– 29 unused DSL IP addresses– 4,400 exploit samples

• Enterprise Trace– 1 Hour– Active responders– 5x /24 subnets– 1,500 exploit samples

19

Shellcode extraction

• Shield (Sigcomm’04)– Framework for specifying network-based

protocols and vulnerabilities

– Extracts shellcodes from raw network packets

20

Shellcode decryption

• Shellcode is encrypted– Use shellcode’s own decryption loop!

• Limited emulation– Similar to generic decryption technique used

for viruses

21

Comparing samples:Candidate metrics

• Edit distance– Too specific: non-code portions of payload

made related exploits unnecessarily distant

• Structural distance– Control flow graph over basic blocks– Basic blocks summarized with a color/hash– Too general: did not capture subtle instruction

variations between exploit families

22

Comparing samples:Final metric

• Exedit distance metric– Edit distance over executed parts of shellcode

• Distinguishes code from data• Maintains instruction-level details

Canonical string for shellcode

23

Cluster analysis

• Need to group samples using the exedit distance metric

• Agglomerative clustering– Each iteration, merge closest pair of clusters– Cluster distance = distance of furthest

samples between two clusters

24

Results

• Caught exploits for 5 vulnerabilities over traces• Summary for residential trace

Exploits Unique exploits

Families

SQL Resolution 767 2 1

LSASS 1,769 56 5

ISystemActivator 1,561 90 6

RemoteActivation 338 338 2

25

ISystemActivator

10% clustering threshold

Need to manually verify this…

6 families

26

ISystemActivator

4-byte decoding key

Kernel-address loading function

Function-finding block

27

ISystemActivator

4-byte decoding key

Kernel-address loading function

Function-finding block

4-byte encoding key

Kernel base loader

Function finder

28

ISystemActivator

Longest payload

Many function blocks in middle of payload

29

ISystemActivator

Command-line call to “tftp.exe”

30

ISystemActivator

Different instructions in parts, otherwise very similar

31

ISystemActivator

“Bind” version“Connect-back” version

32

Conclusions

• Systematic method for classifying exploits– Exploit collection– Shellcode extraction and decryption– Shellcode comparison using exedit distance– Group exploits with clustering

• Similarity between samples in computed phylogenies corresponded well with observed differences

• Useful step toward automating malware classification

Documents

Finding Diversity in Remote Code Injection Exploits