65
Background Malware Revese Engineering Summary CS 6V81-05: System Security and Malicious Code Analysis Fighting for Malware: Unpacking, Disassembling, Decompilation Zhiqiang Lin Department of Computer Science University of Texas at Dallas April 18 th , 2012

CS 6V81-05: System Security and Malicious Code …zxl111930/spring2012/public/lec23.pdfFighting for Malware: Unpacking, Disassembling, Decompilation Zhiqiang Lin Department of Computer

  • Upload
    docong

  • View
    218

  • Download
    4

Embed Size (px)

Citation preview

Background Malware Revese Engineering Summary

CS 6V81-05: System Security and Malicious Code Analysis

Fighting for Malware: Unpacking, Disassembling,Decompilation

Zhiqiang Lin

Department of Computer ScienceUniversity of Texas at Dallas

April 18th, 2012

Background Malware Revese Engineering Summary

Outline

1 Background

2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding

3 Summary

Outline

1 Background

2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding

3 Summary

Background Malware Revese Engineering Summary

Background

Background Malware Revese Engineering Summary

Background

State-of-the-art in Today’s malwareLarge botnetsDiverse propagation vectors, exploits, C&CCapabilities – backdoor, keylogging, rootkits,Logic bombs, time-bombsDiverse targets: desktops, mobile platforms, SCADAsystems (Stuxnet)

Credit: Most of the slides in this lecture are compiled from HassenSaidi’s presentation on “Reverse Engineering Malware”

Background Malware Revese Engineering Summary

Motivation

Malware is not about script-kiddies anymore, it’s realbusiness.Recent events indicate that it can be a powerful weapon incyber warfare.Manual reverse-engineering is close to impossible

Need automated techniques to extract system logic,interactions and side-effects, derive intent, and devisemitigating strategies.

Background Malware Revese Engineering Summary

Capturing Malware

Honeynets: Capture malware that scans the Internet forvulnerable targetsMining SPAM for attachmentsMining SPAM for malicious URLs, and capturing drive-bydownloadsAV heuristics

Background Malware Revese Engineering Summary

Malware Analysis = Reverse EngineeringMalware Binary Analysis

01001010100101010

10101010011010101

01001010100101010

10101010011010101

01001010100101010

10101010011010101

01001010100101010

10101010011010101

10101010011010101

01001010100101010

10101010011010101

.exe

Typically a stripped

binary with no

debugging information.

In the case of malicious

code, it is often obfuscated

and packed

Often has embedded suicide logic and

anti-analysis logic

• What does the malware do

• How does it do it

• identify triggers

• What is the purpose of the

malware

• is this an instance of a known

threat or a new malware

• who is the author

• …

Challenges:

• lack of automation

• time-critical analysis

• labor intensive

• requires a human in the loop

What does the malware do

How does it do it

Identify triggers

What is the purpose of the malware

Is this an instance of a known threat or a new

malware

who is the author

. . .

Typically a stripped binary with no debugging information.

In the case of malicious code, it is often obfuscated and

packed

Often has embedded suicide logic and anti-analysis logic

Challengeslack of automation

time-critical analysis

labor intensive

requires a human in the loop

Background Malware Revese Engineering Summary

Dynamic vs Static Malware Analysis

Dynamic AnalysisTechniques that profile actions of binary at runtimeMore popular

CWSandbox, TTAnalyze, multipath explorationOnly provides partial “effects-oriented profile’’ of malwarepotential

Static AnalysisCan provide complementary insightsPotential for more comprehensive assessment

Background Malware Revese Engineering Summary

Malware Evasions and Obfuscations

To defeat signature based detection schemes

Polymorphism, metamorphism: started appearing inviruses of the 90’s primarily to defeat AV tools

To defeat Dynamic Malware Analysis

Anti-debugging, anti-tracing, anti-memory dumpingVMM detection, emulator detection

To defeat Static Malware analysis

Encryption (packing)API and control-flow obfuscationsAnti-disassembly

The main purpose of obfuscation is to slow down thesecurity community

Background Malware Revese Engineering Summary

Malware Revere Engineering

Goal: Systematic and AutomaticUnpack most of contemporary malwareHandle most if not all packersDeobfuscate API referencesAutomate identification of capabilitiesProvide feedback on unpacking successSimplify and annotate call graphs to illustrate interactionsbetween key logical blocksEnable decompilation of assemply code into a higher-levellanguageIdentify key logical blocks (crypto, unpacking for instance)Reuse certain logical parts...

Background Malware Revese Engineering Summary

Reverse Engineering Phases

Phase-I: Unpacking

Unpacking phase: the image of a running malware sampleis often considered damaged:

No known OEP. Imported APIs are invoked dynamically andthe original import table is destroyed. Arbitrary sectionnames and r/w/e permissions.

Phase-II: DisassemblyIdentification of code and data segmentsRelies on the unpacker to capture all code and datasegments.

Background Malware Revese Engineering Summary

Reverse Engineering Phases

Phase-III: DecompilationReconstruction of the code segment into a C-like higherlevel representationRelies on the disassembler to recognize functionboundaries, targets of call sites, imports, and OEP..

Phase-IV: Program understanding

Relies on the decompiler to produce readable C code, byrecognizing the compiler, calling conventions, stack framesmanipulation, functions prologs and epilogs, user-defineddata structures.

Outline

1 Background

2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding

3 Summary

Outline

1 Background

2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding

3 Summary

Background Malware Revese Engineering Summary

Phase 1: Malware Unpacking

Phase 1: Malware Unpacking

Unpacking

Background Malware Revese Engineering Summary

What’s the problem

Executable compressionExecutable compression is any means of compressing anexecutable file and combining the compressed data withdecompression code into a single executable. When thiscompressed executable is executed, the decompression coderecreates the original code from the compressed code beforeexecuting it. In most cases this happens transparently so thecompressed executable can be used in exactly the same wayas the original.

Code → Data → Code

http://en.wikipedia.org/wiki/Executable_compression

Background Malware Revese Engineering Summary

Available Packer

Background Malware Revese Engineering Summary

When decompression/unpacking finishes?

Heuristic-based UnpackingHeuristic #1: Dump as late as possible.NtTerminateProcessHeuristic #2: Dump when your program generates errors.NtRaiseHardErrorHeuristic #3: Dump when program forks a child process.NtCreateProcessHeursitic #4: Dump when program sends/recv packets....

IssuesWeak adversarial model, too simple to evade. . .Multi-layer packing?

Background Malware Revese Engineering Summary

When decompression/unpacking finishes?

Statistics-based Unpacking

Observations

Statistical properties of packed executable differ fromunpacked exectuableAs malware executes code-to-data ratio increases

Complications

Code and data sections are interleaved in PE executablesData directories(import tables) look similar to data but areoften found in code sectionsProperties of data sections vary with packers

Background Malware Revese Engineering Summary

Overview of Unpacker

Static AnalysisDecompile and analyze the logical structure, flow, and datastored within the binary itself.Able to figure out what the packer is by fingerprinting thestructure of the codeIt is difficult, if possible, to reveal the binary code withoutreal execution

Dynamic analysisMonitor the behavior of the malware binary at runtime.

Fine-grained monitor (Instruction-level)Coarse-grained monitor (page-level)

Background Malware Revese Engineering Summary

Generic Automatic Unpackers

Generic Automatic Unpackers

PolyUnpack Renovo OmniUnpack Eureka

Instruction-level Instruction-level Page-level System call level

Model-base trigger

Heuristic triggerHeuristic trigger Heuristic andStatistical trigger

slow slow fast fast

Justin

Page-level

Stack/environmenttrigger

fast

ACSAC 2006 Worm 2007 ACSAC 2007 ESORICS 2008 RAID 2008

Background Malware Revese Engineering Summary

Unpacker Case Study: The Eureka Framework

Novel unpacking technique based on coarse grainedexecution tracingHeuristic-based and statistic-based upackingImplements several techniques to handle obfucated APIreferencesMultiple metrics to evaluate unpack successAnnotated call graphs provide bird’s eye view of systeminteraction

http://eureka.cyber-ta.org/

Background Malware Revese Engineering Summary

The Eureka WorkflowThe Eureka Workflow

Trace Malware syscalls in

VM

Syscall trace

Heuristic based offline

analysis

Eureka’s

Unpacker

Favorable execution

point

Packed Binary

Un- packed Binary

Dis-assembly IDA-Pro

Un-Packed .ASM

Dis-assembly IDA-Pro

Packed .ASM

Statistics based

Evaluator

Unpack Evaluatio

n

Raw unpacked Executable

•Unknown OEP •No debug information •Unresolved library calls •Snapshot of data segment •Unreachable code •Loss of structures Statistics

based Evaluator

Background Malware Revese Engineering Summary

EurekaEureka

Coarse-grained

execution

tracing

NtTerminateProcess

NtCreateProcess

Eureka

Statistical

bigram analysis

bigram.

Background Malware Revese Engineering Summary

Coarse-grained Execution Monitoring

Generalized unpacking principle

Execute binary till it has sufficiently revealed itself, such asthe event of program exit as a trigger (NtTerminateProcessimplies that the unpacked malicious payload has beensuccessfully decrypted).Dump the process execution image for static analysis

Monitoring execution progress

Eureka employs a Windows driver that hooks to SSDT(System Service Dispatch Table)Callback invoked on each NTDLL system callFiltering based on malware process pid

Background Malware Revese Engineering Summary

Problems

Not all malware exit and keep an executing versionresident in memory

Packers can make spurious event of creating new process.Malware authors can simply avoid exiting the malwareprocess.The above two simple heuristics may work for a largefraction of malware today (as much as 80%), it may not bethe same for future malware.

Background Malware Revese Engineering Summary

Statistical bigram analysis

Mining statistical patterns in x86 code

Use simple n-gram analysisUse the IDA Pro to extract regions from executable thatwere marked as functions.Looking for the most common bigrams ( opcode pairs or2-byte opcodes) and space bigrams( byte pairs separatedby 1 or more bytes)Found FF 15(call) , FF 75(push), E8—00 and E8—FF areprevalent in x86 code.

Background Malware Revese Engineering Summary

Occurrence summary of bigrams

Occurrence summary of bigrams

calc explorer notepad ping shutdown

FF 15(call) 246 3045 415 58 132

FF 75(push) 235 2494 245 41 85

E8---FF(call) 1583 2201 180 87 49

E8---00(call) 746 1091 108 57 66

Background Malware Revese Engineering Summary

Code-to-data ratio

Grey area stand for dataBlue area stand for code

Original notepad.exe memory space

Packed notepad.exe memory space

Background Malware Revese Engineering Summary

Bigram Counts

There are consistent and significant shifts in the bigramcounts.The simple bigram counting approach had over a 95%success rate in distinguishing between packed andunpacked malware instance.

Background Malware Revese Engineering Summary

Evaluation (ASPack)

Background Malware Revese Engineering Summary

Evaluation (MoleBox)

Background Malware Revese Engineering Summary

Evaluation (Armadillo)

Background Malware Revese Engineering Summary

Evaluation

Background Malware Revese Engineering Summary

Systematic Approach Unpacking

Automatic Unpacking: involves running the malware andcapturing its memory image.Monitoring the execution of the malware is an intrusiveprocess and is often detected using anti-tracing andanti-debugging techniques embedded in the malware.Eureka consists of minimal monitoring and capturing theprocess image at key events:

ExitProcessByte bigram monitoring: call, push instructions for instanceNumber of seconds elapsedRun the malware without monitoring and suspend itsexecution and perform memory inspection

In practice, Eureka always manages to get a dump(memory snapshot) of the running process: no OEP andno Import table

Outline

1 Background

2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding

3 Summary

Background Malware Revese Engineering Summary

Phase 2: Disassembly

The disassembler reads the PE data structure in order to:1 Determine the different sections of the file and separate code from data

and identifies resource information such as import tables

The disassembler relies on the PE data structure (could be corrupt)The disassembler translates into code, any referenced address fromknown code location

2 Translate code segments into assembly language

The disassembler relies on the hardware instruction setdocumentation

3 Interpret data according to identified types

A data referenced by code can be of any type: integer, string, struct,etc.

Integer:0x0040F45C dword_40F45C dd 0E06D7363h, 1, 2 dup(0) ; DATA XREF: 408C98String:0x0040F45C unk_40F45C db 63h ; c ; DATA XREF: sub_408C980x0040F45D db 73h ; s0x0040F45E db 6Dh ; m0x0040F45F db 0E0h ; a

Background Malware Revese Engineering Summary

IDA Pro Disassembler

http://www.hex-rays.com/idapro/

It supports a variety of executable formats for differentprocessors and operating systems. It also can be used as adebugger for Windows PE, Mac OS X, and Linux ELFexecutables.IDA performs a large degree of automatic code analysis toa certain extent, leveraging cross-references between codesections, knowledge of parameters of API calls, limiteddataflow analysis, and recognition of standard libraries.

Hashes of known statically linked libraries are compared tohashes of identified subroutines in the code

Provides scripting languages to interact with the system toimprove the analysis.Support plug-ins: The IDA decompiler is the mostimpressive plug-in.

Background Malware Revese Engineering Summary

PE Execution

1 Read the PortableExecutable (PE) file datastructure and maps thefile into memory

2 Load import modules

1 Start execution at entrypoint

2 Runtime unpacking3 Jump to OEP

Background Malware Revese Engineering Summary

Fixing the Disassembled Code

Unpacked & disassembled code does not have an OEP, asshown in Eureka framework.Import tables are rebuilt dynamically and there are nostatic references to dynamically loaded librariesHeader information is not reliableData is not typed

Background Malware Revese Engineering Summary

Parsing the PE executable format

Background Malware Revese Engineering Summary

Challenges in Binary Code Disassembly

Disassembly is not an exact science: On CISC platformswith variable-width instructions, or in the presence ofself-modifying code, it is possible for a single program tohave two or more reasonable disassemblies. Determiningwhich instructions would actually be encountered during arun of the program reduces to the proven-unsolvablehalting problem.Bad disassembly because of variable length instructionsJumps into middle of instructionsNo reachability analysis: Unreachable code can hide data.

Background Malware Revese Engineering Summary

Examples of Disassembly problems (The StormWorm)

Data hidden as code:ArcadeWorld.exe:0042BDB0 mov eax, 0ArcadeWorld.exe:0042BDB5 test eax, eaxArcadeWorld.exe:0042BDB7 jnz short loc_42BDD8; unreachable code

ArcadeWorld.exe:0042BDD8 loc_42BDD8: ; CODE XREF: 0042BDB7ArcadeWorld.exe:0042BDD8 lea edx, [ebp+401C42h]ArcadeWorld.exe:0042BDDE lea eax, [ebp+40B220h]ArcadeWorld.exe:0042BDE4 push eaxArcadeWorld.exe:0042BDE5 push 8A20hArcadeWorld.exe:0042BDEA push edxArcadeWorld.exe:0042BDEB call near ptr unk_42BF7DArcadeWorld.exe:0042BDF0 pushaArcadeWorld.exe:0042BDF1 mov edi, [ebp+40C067h]ArcadeWorld.exe:0042BDF7 add edi, [ebp+40C03Fh]ArcadeWorld.exe:0042BDFD lea esi, [ebp+40BFBFh]

Background Malware Revese Engineering Summary

API Resolution

User-level malware programs require system calls toperform malicious actionsUse Win32 API to access user level librariesObufscations impede malware analysis using IDA Pro orOllyDbg

Packers use non-standard linking and loading of dllsObfuscated API resolution

Background Malware Revese Engineering Summary

Standard API Resolution

Imports in IAT identified by IDA by looking at Import Table

Background Malware Revese Engineering Summary

Handling Thunks

Identify subroutines with a JMP instruction onlyTreat any calls to these subs as an API call

Handling Thunks

IsDebuggerPresent

Background Malware Revese Engineering Summary

Leveraging Standard API Address Loading

==================================================Function Name : ADSICloseDSObjectAddress : 0x76e30826Relative Address : 0x00020826Ordinal : 142 (0x8e)Filename : adsldpc.dllFull Path : c:\WINDOWS\system32\adsldpc.dllType : Exported Function==================================================

==================================================Function Name : ADSICloseSearchHandleAddress : 0x76e3050aRelative Address : 0x0002050aOrdinal : 143 (0x8f)Filename : adsldpc.dllFull Path : c:\WINDOWS\system32\adsldpc.dllType : Exported Function==================================================

==================================================Function Name : ADSICreateDSObjectAddress : 0x76e30447Relative Address : 0x00020447Ordinal : 144 (0x90)Filename : adsldpc.dllFull Path : c:\WINDOWS\system32\adsldpc.dllType : Exported Function==================================================

Background Malware Revese Engineering Summary

Using Dataflow Analysis

Identify register based indirect calls

Using Dataflow Analysis

• Identify register based indirect calls GetEnvironmentStringW

use

def

Background Malware Revese Engineering Summary

Handling Dynamic Pointer Updates

Identify register based indirect callsHandling Dynamic Pointer Updates

• Identify register based indirect calls

dword_41e304 has no static value to look up API

use

def

A def to dword_41e308 is found Look for probable call to GetProcAddress earlier

Call to GetProcAddress

Background Malware Revese Engineering Summary

Rebuilding the unpacked executable

From a damaged dumped image of a running malware to aPE executable:

Knowing all APIs allows us to identify the OEP.Semantic approach: ExitProcess, CreateMutex,GetCommandLine, GetModulaHandle, etc are close toOEP.There are about 20 APIs that are often called at thebeginning of the execution of the code.Structural approach: find sources of call graphs in thebinaryRebuilding in import table with all references to identifiedAPIs

The disassembly of the reconstructed PE is often of betterquality than the disassembly of the dumped process image

The new PE code bypasses the unpacking routineembedded in the packed codeThe new PE contains the original code

Outline

1 Background

2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding

3 Summary

Background Malware Revese Engineering Summary

Phase 3: Decompilation

Identifies local variablesIdentifies arguments: registers, stack, or any combinationIdentifies global variablesIdentify calling conventionsIdentifies common idioms and compiler featuresEliminates the use of registers as intermediate variablesIdentifies control structures

Background Malware Revese Engineering Summary

Decompilation Depends on Previous Analysis PhasesDecompilation Depends on Previous Analysis Phases

Background Malware Revese Engineering Summary

Analysis Phases

Analysis Phases

Compiler

Source Code

Executable code

Assembly code

Legitimate C/C++

that a compiler

would generate

Disassembly &

Analysis

Decompilation

Ideally

Unpacking

Malware

Non-executable

code

Obfuscated

assembly

A mess

Disassembly &

Analysis

Decompilation

Reality

Assembly code

Legitimate C/C++

Undo

Obfuscation

Decompilation

Background Malware Revese Engineering Summary

Example of Binary RewriteExample of Binary Rewrite

.text:009B327C OBFUSCATED_VERSION_OF_is_private_subnet proc near

.text:009B327C mov ecx, eax

.text:009B327E and ecx, 0FFFFh

.text:009B3284 cmp ecx, 0A8C0h

.text:009B328A jz loc_9B4264

.text:009B3290 jmp off_9BAAA5

/* .text:009BAAA5 off_9BAAA5 dd offset loc_9AB10C */

.text:009B3290 OBFUSCATED_VERSION_OF_is_private_subnet endp

.

.text:009B4264 loc_9B4264:

.text:009B4264

.text:009B4264 mov eax, 1

.text:009B4269 retn

.

.text:009AB10C cmp al, 0Ah

.text:009AB10E jz loc_9B4264

.text:009AB114 jmp off_9BA137

/* .text:009BA137 off_9BA137 dd offset loc_9B2CE4 */

.

.text:009B2CE4 and eax, 0F0FFh

.text:009B2CE9 cmp eax, 10ACh

.text:009B2CEE jz loc_9B4264

.text:009B2CF4 jmp off_9BA3CC

/* .text:009BA3CC off_9BA3CC dd offset loc_9ABA24 */

.

.text:009ABA24 xor eax, eax

.text:009ABA26 retn

.text:009A1311 is_private_subnet proc near

.text:009A1311 mov ecx, eax

.text:009A1313 and ecx, 0FFFFh

.text:009A1319 cmp ecx, 0A8C0h

.text:009A131F jz loc_9A1340

.text:009A1325 cmp al, 0Ah

.text:009A1327 jz loc_9A1340

.text:009A132D and eax, 0F0FFh

.text:009A1332 cmp eax, 10ACh

.text:009A1337 jz loc_9A1340

.text:009A133D xor eax, eax

.text:009A133F retn

.text:009A1340

.text:009A1340 loc_9A1340:

.text:009A1340 mov eax, 1

.text:009A1345 retn

.text:009A1345 is_private_subnet endp

bool __usercall is_private_subnet (unsigned __int16 a1) {

return a1 == 43200 || a1 == 10 || (a1 & 0xF0FF) == 4268;

}

��int __usercall OBFUSCATED_VERSION_OF_is_private_subnet<eax>(unsigned __int16 a1 <ax>)

{

int result; // eax@2

if ( a1 == 43200 ) result = 1;

else result = off_9BAAA5();

return result;

}

Outline

1 Background

2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding

3 Summary

Background Malware Revese Engineering Summary

Phase 4: Program Understanding

Need to identify higher-level concepts from thedeobfuscated codeNeed to interpret the code into a higher-level malwareobjectiveNeed to indentify particular features: crypto:

Functions that use crypto-related opcode, loops, etcKnown constants in crypto algorithms

Background Malware Revese Engineering Summary

Finding Known and Unknown Crypto

Outline

1 Background

2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding

3 Summary

Background Malware Revese Engineering Summary

Malware analysis: Deobfuscation

Malware author designs obfuscation to

Packing: to reduce the size of binaries and to createpolymorphic malware samplesBesides packing, the more advanced obfuscationtechniques are designed to slow down reverse engineeringefforts and to prevent:

the identification of API calls: identify the basic buildingblocks of the malwarethe control-flow reconstruction of the malware: follow andreconstruct the logic flowstatic analysis: determine the full functionality, triggers,hidden logic, time bombs, etc.

Background Malware Revese Engineering Summary

Why Code Obfuscation is not Easy

Malware authors can design binary code that is extremelydifficult to analyze. Using advanced programminglanguages knowledge, it is possible to create such code.Malware code should be able to run in a reliable manner.Obfuscation should not compromise this importantrequirement and should maintain the reliability of the initialcode. This requires a proof or guarantee of some sort.

Background Malware Revese Engineering Summary

Stuxnet: Keeping it “relatively” simple

Stuxnet does not use advanced binary obfuscationtechniques.The analysis of the code is challenging neverthelessStuxnet Code Characteristics:

Use of C++Use of C++ exception handlingUse of C++ classesUse of simple data encoding (encryption)Use of C structures for all data passed to the mainsubroutines:

Over 40 user-defined structuresNot recognized by disassemblers and decompilers

Background Malware Revese Engineering Summary

Summary

It is always desirable to recover from the malware adescription that is as close as possible to the original codeproduced by the authors.It is often possible to do that in practiceIt is often the only way to really determine the full capabilityof the malwareMalware analysis: deobfuscation + reverse engineeringUnpacking, Disassembling, Decompilation

Background Malware Revese Engineering Summary

Further Reading

PolyUnpack: Automating the Hidden-Code Extraction ofUnpack Malware. In ACSAC 2006OmniUnpack: Fast, Generic, and Safe Unpacking ofMalware. In ACSAC 2007Renovo: A hidden code extractor for packed executables.In WORM 2007Eureka: A Framework for Enabling Static MalwareAnalysis. In ESORICS 2008Justin: A Study of the Packer Problem and Its Solutions. InRAID 2008