Upload
docong
View
218
Download
4
Embed Size (px)
Citation preview
Background Malware Revese Engineering Summary
CS 6V81-05: System Security and Malicious Code Analysis
Fighting for Malware: Unpacking, Disassembling,Decompilation
Zhiqiang Lin
Department of Computer ScienceUniversity of Texas at Dallas
April 18th, 2012
Background Malware Revese Engineering Summary
Outline
1 Background
2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding
3 Summary
Outline
1 Background
2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding
3 Summary
Background Malware Revese Engineering Summary
Background
State-of-the-art in Today’s malwareLarge botnetsDiverse propagation vectors, exploits, C&CCapabilities – backdoor, keylogging, rootkits,Logic bombs, time-bombsDiverse targets: desktops, mobile platforms, SCADAsystems (Stuxnet)
Credit: Most of the slides in this lecture are compiled from HassenSaidi’s presentation on “Reverse Engineering Malware”
Background Malware Revese Engineering Summary
Motivation
Malware is not about script-kiddies anymore, it’s realbusiness.Recent events indicate that it can be a powerful weapon incyber warfare.Manual reverse-engineering is close to impossible
Need automated techniques to extract system logic,interactions and side-effects, derive intent, and devisemitigating strategies.
Background Malware Revese Engineering Summary
Capturing Malware
Honeynets: Capture malware that scans the Internet forvulnerable targetsMining SPAM for attachmentsMining SPAM for malicious URLs, and capturing drive-bydownloadsAV heuristics
Background Malware Revese Engineering Summary
Malware Analysis = Reverse EngineeringMalware Binary Analysis
01001010100101010
10101010011010101
01001010100101010
10101010011010101
01001010100101010
10101010011010101
01001010100101010
10101010011010101
10101010011010101
01001010100101010
10101010011010101
.exe
Typically a stripped
binary with no
debugging information.
In the case of malicious
code, it is often obfuscated
and packed
Often has embedded suicide logic and
anti-analysis logic
• What does the malware do
• How does it do it
• identify triggers
• What is the purpose of the
malware
• is this an instance of a known
threat or a new malware
• who is the author
• …
Challenges:
• lack of automation
• time-critical analysis
• labor intensive
• requires a human in the loop
What does the malware do
How does it do it
Identify triggers
What is the purpose of the malware
Is this an instance of a known threat or a new
malware
who is the author
. . .
Typically a stripped binary with no debugging information.
In the case of malicious code, it is often obfuscated and
packed
Often has embedded suicide logic and anti-analysis logic
Challengeslack of automation
time-critical analysis
labor intensive
requires a human in the loop
Background Malware Revese Engineering Summary
Dynamic vs Static Malware Analysis
Dynamic AnalysisTechniques that profile actions of binary at runtimeMore popular
CWSandbox, TTAnalyze, multipath explorationOnly provides partial “effects-oriented profile’’ of malwarepotential
Static AnalysisCan provide complementary insightsPotential for more comprehensive assessment
Background Malware Revese Engineering Summary
Malware Evasions and Obfuscations
To defeat signature based detection schemes
Polymorphism, metamorphism: started appearing inviruses of the 90’s primarily to defeat AV tools
To defeat Dynamic Malware Analysis
Anti-debugging, anti-tracing, anti-memory dumpingVMM detection, emulator detection
To defeat Static Malware analysis
Encryption (packing)API and control-flow obfuscationsAnti-disassembly
The main purpose of obfuscation is to slow down thesecurity community
Background Malware Revese Engineering Summary
Malware Revere Engineering
Goal: Systematic and AutomaticUnpack most of contemporary malwareHandle most if not all packersDeobfuscate API referencesAutomate identification of capabilitiesProvide feedback on unpacking successSimplify and annotate call graphs to illustrate interactionsbetween key logical blocksEnable decompilation of assemply code into a higher-levellanguageIdentify key logical blocks (crypto, unpacking for instance)Reuse certain logical parts...
Background Malware Revese Engineering Summary
Reverse Engineering Phases
Phase-I: Unpacking
Unpacking phase: the image of a running malware sampleis often considered damaged:
No known OEP. Imported APIs are invoked dynamically andthe original import table is destroyed. Arbitrary sectionnames and r/w/e permissions.
Phase-II: DisassemblyIdentification of code and data segmentsRelies on the unpacker to capture all code and datasegments.
Background Malware Revese Engineering Summary
Reverse Engineering Phases
Phase-III: DecompilationReconstruction of the code segment into a C-like higherlevel representationRelies on the disassembler to recognize functionboundaries, targets of call sites, imports, and OEP..
Phase-IV: Program understanding
Relies on the decompiler to produce readable C code, byrecognizing the compiler, calling conventions, stack framesmanipulation, functions prologs and epilogs, user-defineddata structures.
Outline
1 Background
2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding
3 Summary
Outline
1 Background
2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding
3 Summary
Background Malware Revese Engineering Summary
Phase 1: Malware Unpacking
Phase 1: Malware Unpacking
Unpacking
Background Malware Revese Engineering Summary
What’s the problem
Executable compressionExecutable compression is any means of compressing anexecutable file and combining the compressed data withdecompression code into a single executable. When thiscompressed executable is executed, the decompression coderecreates the original code from the compressed code beforeexecuting it. In most cases this happens transparently so thecompressed executable can be used in exactly the same wayas the original.
Code → Data → Code
http://en.wikipedia.org/wiki/Executable_compression
Background Malware Revese Engineering Summary
When decompression/unpacking finishes?
Heuristic-based UnpackingHeuristic #1: Dump as late as possible.NtTerminateProcessHeuristic #2: Dump when your program generates errors.NtRaiseHardErrorHeuristic #3: Dump when program forks a child process.NtCreateProcessHeursitic #4: Dump when program sends/recv packets....
IssuesWeak adversarial model, too simple to evade. . .Multi-layer packing?
Background Malware Revese Engineering Summary
When decompression/unpacking finishes?
Statistics-based Unpacking
Observations
Statistical properties of packed executable differ fromunpacked exectuableAs malware executes code-to-data ratio increases
Complications
Code and data sections are interleaved in PE executablesData directories(import tables) look similar to data but areoften found in code sectionsProperties of data sections vary with packers
Background Malware Revese Engineering Summary
Overview of Unpacker
Static AnalysisDecompile and analyze the logical structure, flow, and datastored within the binary itself.Able to figure out what the packer is by fingerprinting thestructure of the codeIt is difficult, if possible, to reveal the binary code withoutreal execution
Dynamic analysisMonitor the behavior of the malware binary at runtime.
Fine-grained monitor (Instruction-level)Coarse-grained monitor (page-level)
Background Malware Revese Engineering Summary
Generic Automatic Unpackers
Generic Automatic Unpackers
PolyUnpack Renovo OmniUnpack Eureka
Instruction-level Instruction-level Page-level System call level
Model-base trigger
Heuristic triggerHeuristic trigger Heuristic andStatistical trigger
slow slow fast fast
Justin
Page-level
Stack/environmenttrigger
fast
ACSAC 2006 Worm 2007 ACSAC 2007 ESORICS 2008 RAID 2008
Background Malware Revese Engineering Summary
Unpacker Case Study: The Eureka Framework
Novel unpacking technique based on coarse grainedexecution tracingHeuristic-based and statistic-based upackingImplements several techniques to handle obfucated APIreferencesMultiple metrics to evaluate unpack successAnnotated call graphs provide bird’s eye view of systeminteraction
http://eureka.cyber-ta.org/
Background Malware Revese Engineering Summary
The Eureka WorkflowThe Eureka Workflow
Trace Malware syscalls in
VM
Syscall trace
Heuristic based offline
analysis
Eureka’s
Unpacker
Favorable execution
point
Packed Binary
Un- packed Binary
Dis-assembly IDA-Pro
Un-Packed .ASM
Dis-assembly IDA-Pro
Packed .ASM
Statistics based
Evaluator
Unpack Evaluatio
n
Raw unpacked Executable
•Unknown OEP •No debug information •Unresolved library calls •Snapshot of data segment •Unreachable code •Loss of structures Statistics
based Evaluator
Background Malware Revese Engineering Summary
EurekaEureka
Coarse-grained
execution
tracing
NtTerminateProcess
NtCreateProcess
Eureka
Statistical
bigram analysis
bigram.
Background Malware Revese Engineering Summary
Coarse-grained Execution Monitoring
Generalized unpacking principle
Execute binary till it has sufficiently revealed itself, such asthe event of program exit as a trigger (NtTerminateProcessimplies that the unpacked malicious payload has beensuccessfully decrypted).Dump the process execution image for static analysis
Monitoring execution progress
Eureka employs a Windows driver that hooks to SSDT(System Service Dispatch Table)Callback invoked on each NTDLL system callFiltering based on malware process pid
Background Malware Revese Engineering Summary
Problems
Not all malware exit and keep an executing versionresident in memory
Packers can make spurious event of creating new process.Malware authors can simply avoid exiting the malwareprocess.The above two simple heuristics may work for a largefraction of malware today (as much as 80%), it may not bethe same for future malware.
Background Malware Revese Engineering Summary
Statistical bigram analysis
Mining statistical patterns in x86 code
Use simple n-gram analysisUse the IDA Pro to extract regions from executable thatwere marked as functions.Looking for the most common bigrams ( opcode pairs or2-byte opcodes) and space bigrams( byte pairs separatedby 1 or more bytes)Found FF 15(call) , FF 75(push), E8—00 and E8—FF areprevalent in x86 code.
Background Malware Revese Engineering Summary
Occurrence summary of bigrams
Occurrence summary of bigrams
calc explorer notepad ping shutdown
FF 15(call) 246 3045 415 58 132
FF 75(push) 235 2494 245 41 85
E8---FF(call) 1583 2201 180 87 49
E8---00(call) 746 1091 108 57 66
Background Malware Revese Engineering Summary
Code-to-data ratio
Grey area stand for dataBlue area stand for code
Original notepad.exe memory space
Packed notepad.exe memory space
Background Malware Revese Engineering Summary
Bigram Counts
There are consistent and significant shifts in the bigramcounts.The simple bigram counting approach had over a 95%success rate in distinguishing between packed andunpacked malware instance.
Background Malware Revese Engineering Summary
Systematic Approach Unpacking
Automatic Unpacking: involves running the malware andcapturing its memory image.Monitoring the execution of the malware is an intrusiveprocess and is often detected using anti-tracing andanti-debugging techniques embedded in the malware.Eureka consists of minimal monitoring and capturing theprocess image at key events:
ExitProcessByte bigram monitoring: call, push instructions for instanceNumber of seconds elapsedRun the malware without monitoring and suspend itsexecution and perform memory inspection
In practice, Eureka always manages to get a dump(memory snapshot) of the running process: no OEP andno Import table
Outline
1 Background
2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding
3 Summary
Background Malware Revese Engineering Summary
Phase 2: Disassembly
The disassembler reads the PE data structure in order to:1 Determine the different sections of the file and separate code from data
and identifies resource information such as import tables
The disassembler relies on the PE data structure (could be corrupt)The disassembler translates into code, any referenced address fromknown code location
2 Translate code segments into assembly language
The disassembler relies on the hardware instruction setdocumentation
3 Interpret data according to identified types
A data referenced by code can be of any type: integer, string, struct,etc.
Integer:0x0040F45C dword_40F45C dd 0E06D7363h, 1, 2 dup(0) ; DATA XREF: 408C98String:0x0040F45C unk_40F45C db 63h ; c ; DATA XREF: sub_408C980x0040F45D db 73h ; s0x0040F45E db 6Dh ; m0x0040F45F db 0E0h ; a
Background Malware Revese Engineering Summary
IDA Pro Disassembler
http://www.hex-rays.com/idapro/
It supports a variety of executable formats for differentprocessors and operating systems. It also can be used as adebugger for Windows PE, Mac OS X, and Linux ELFexecutables.IDA performs a large degree of automatic code analysis toa certain extent, leveraging cross-references between codesections, knowledge of parameters of API calls, limiteddataflow analysis, and recognition of standard libraries.
Hashes of known statically linked libraries are compared tohashes of identified subroutines in the code
Provides scripting languages to interact with the system toimprove the analysis.Support plug-ins: The IDA decompiler is the mostimpressive plug-in.
Background Malware Revese Engineering Summary
PE Execution
1 Read the PortableExecutable (PE) file datastructure and maps thefile into memory
2 Load import modules
1 Start execution at entrypoint
2 Runtime unpacking3 Jump to OEP
Background Malware Revese Engineering Summary
Fixing the Disassembled Code
Unpacked & disassembled code does not have an OEP, asshown in Eureka framework.Import tables are rebuilt dynamically and there are nostatic references to dynamically loaded librariesHeader information is not reliableData is not typed
Background Malware Revese Engineering Summary
Challenges in Binary Code Disassembly
Disassembly is not an exact science: On CISC platformswith variable-width instructions, or in the presence ofself-modifying code, it is possible for a single program tohave two or more reasonable disassemblies. Determiningwhich instructions would actually be encountered during arun of the program reduces to the proven-unsolvablehalting problem.Bad disassembly because of variable length instructionsJumps into middle of instructionsNo reachability analysis: Unreachable code can hide data.
Background Malware Revese Engineering Summary
Examples of Disassembly problems (The StormWorm)
Data hidden as code:ArcadeWorld.exe:0042BDB0 mov eax, 0ArcadeWorld.exe:0042BDB5 test eax, eaxArcadeWorld.exe:0042BDB7 jnz short loc_42BDD8; unreachable code
ArcadeWorld.exe:0042BDD8 loc_42BDD8: ; CODE XREF: 0042BDB7ArcadeWorld.exe:0042BDD8 lea edx, [ebp+401C42h]ArcadeWorld.exe:0042BDDE lea eax, [ebp+40B220h]ArcadeWorld.exe:0042BDE4 push eaxArcadeWorld.exe:0042BDE5 push 8A20hArcadeWorld.exe:0042BDEA push edxArcadeWorld.exe:0042BDEB call near ptr unk_42BF7DArcadeWorld.exe:0042BDF0 pushaArcadeWorld.exe:0042BDF1 mov edi, [ebp+40C067h]ArcadeWorld.exe:0042BDF7 add edi, [ebp+40C03Fh]ArcadeWorld.exe:0042BDFD lea esi, [ebp+40BFBFh]
Background Malware Revese Engineering Summary
API Resolution
User-level malware programs require system calls toperform malicious actionsUse Win32 API to access user level librariesObufscations impede malware analysis using IDA Pro orOllyDbg
Packers use non-standard linking and loading of dllsObfuscated API resolution
Background Malware Revese Engineering Summary
Standard API Resolution
Imports in IAT identified by IDA by looking at Import Table
Background Malware Revese Engineering Summary
Handling Thunks
Identify subroutines with a JMP instruction onlyTreat any calls to these subs as an API call
Handling Thunks
IsDebuggerPresent
Background Malware Revese Engineering Summary
Leveraging Standard API Address Loading
==================================================Function Name : ADSICloseDSObjectAddress : 0x76e30826Relative Address : 0x00020826Ordinal : 142 (0x8e)Filename : adsldpc.dllFull Path : c:\WINDOWS\system32\adsldpc.dllType : Exported Function==================================================
==================================================Function Name : ADSICloseSearchHandleAddress : 0x76e3050aRelative Address : 0x0002050aOrdinal : 143 (0x8f)Filename : adsldpc.dllFull Path : c:\WINDOWS\system32\adsldpc.dllType : Exported Function==================================================
==================================================Function Name : ADSICreateDSObjectAddress : 0x76e30447Relative Address : 0x00020447Ordinal : 144 (0x90)Filename : adsldpc.dllFull Path : c:\WINDOWS\system32\adsldpc.dllType : Exported Function==================================================
Background Malware Revese Engineering Summary
Using Dataflow Analysis
Identify register based indirect calls
Using Dataflow Analysis
• Identify register based indirect calls GetEnvironmentStringW
use
def
Background Malware Revese Engineering Summary
Handling Dynamic Pointer Updates
Identify register based indirect callsHandling Dynamic Pointer Updates
• Identify register based indirect calls
dword_41e304 has no static value to look up API
use
def
A def to dword_41e308 is found Look for probable call to GetProcAddress earlier
Call to GetProcAddress
Background Malware Revese Engineering Summary
Rebuilding the unpacked executable
From a damaged dumped image of a running malware to aPE executable:
Knowing all APIs allows us to identify the OEP.Semantic approach: ExitProcess, CreateMutex,GetCommandLine, GetModulaHandle, etc are close toOEP.There are about 20 APIs that are often called at thebeginning of the execution of the code.Structural approach: find sources of call graphs in thebinaryRebuilding in import table with all references to identifiedAPIs
The disassembly of the reconstructed PE is often of betterquality than the disassembly of the dumped process image
The new PE code bypasses the unpacking routineembedded in the packed codeThe new PE contains the original code
Outline
1 Background
2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding
3 Summary
Background Malware Revese Engineering Summary
Phase 3: Decompilation
Identifies local variablesIdentifies arguments: registers, stack, or any combinationIdentifies global variablesIdentify calling conventionsIdentifies common idioms and compiler featuresEliminates the use of registers as intermediate variablesIdentifies control structures
Background Malware Revese Engineering Summary
Decompilation Depends on Previous Analysis PhasesDecompilation Depends on Previous Analysis Phases
Background Malware Revese Engineering Summary
Analysis Phases
Analysis Phases
Compiler
Source Code
Executable code
Assembly code
Legitimate C/C++
that a compiler
would generate
Disassembly &
Analysis
Decompilation
Ideally
Unpacking
Malware
Non-executable
code
Obfuscated
assembly
A mess
Disassembly &
Analysis
Decompilation
Reality
Assembly code
Legitimate C/C++
Undo
Obfuscation
Decompilation
Background Malware Revese Engineering Summary
Example of Binary RewriteExample of Binary Rewrite
.text:009B327C OBFUSCATED_VERSION_OF_is_private_subnet proc near
.text:009B327C mov ecx, eax
.text:009B327E and ecx, 0FFFFh
.text:009B3284 cmp ecx, 0A8C0h
.text:009B328A jz loc_9B4264
.text:009B3290 jmp off_9BAAA5
/* .text:009BAAA5 off_9BAAA5 dd offset loc_9AB10C */
.text:009B3290 OBFUSCATED_VERSION_OF_is_private_subnet endp
.
.text:009B4264 loc_9B4264:
.text:009B4264
.text:009B4264 mov eax, 1
.text:009B4269 retn
.
.text:009AB10C cmp al, 0Ah
.text:009AB10E jz loc_9B4264
.text:009AB114 jmp off_9BA137
/* .text:009BA137 off_9BA137 dd offset loc_9B2CE4 */
.
.text:009B2CE4 and eax, 0F0FFh
.text:009B2CE9 cmp eax, 10ACh
.text:009B2CEE jz loc_9B4264
.text:009B2CF4 jmp off_9BA3CC
/* .text:009BA3CC off_9BA3CC dd offset loc_9ABA24 */
.
.text:009ABA24 xor eax, eax
.text:009ABA26 retn
.text:009A1311 is_private_subnet proc near
.text:009A1311 mov ecx, eax
.text:009A1313 and ecx, 0FFFFh
.text:009A1319 cmp ecx, 0A8C0h
.text:009A131F jz loc_9A1340
.text:009A1325 cmp al, 0Ah
.text:009A1327 jz loc_9A1340
.text:009A132D and eax, 0F0FFh
.text:009A1332 cmp eax, 10ACh
.text:009A1337 jz loc_9A1340
.text:009A133D xor eax, eax
.text:009A133F retn
.text:009A1340
.text:009A1340 loc_9A1340:
.text:009A1340 mov eax, 1
.text:009A1345 retn
.text:009A1345 is_private_subnet endp
bool __usercall is_private_subnet (unsigned __int16 a1) {
return a1 == 43200 || a1 == 10 || (a1 & 0xF0FF) == 4268;
}
��int __usercall OBFUSCATED_VERSION_OF_is_private_subnet<eax>(unsigned __int16 a1 <ax>)
{
int result; // eax@2
if ( a1 == 43200 ) result = 1;
else result = off_9BAAA5();
return result;
}
Outline
1 Background
2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding
3 Summary
Background Malware Revese Engineering Summary
Phase 4: Program Understanding
Need to identify higher-level concepts from thedeobfuscated codeNeed to interpret the code into a higher-level malwareobjectiveNeed to indentify particular features: crypto:
Functions that use crypto-related opcode, loops, etcKnown constants in crypto algorithms
Outline
1 Background
2 Malware Revese EngineeringUnpackingDisassemblyDecompilationProgram Understanding
3 Summary
Background Malware Revese Engineering Summary
Malware analysis: Deobfuscation
Malware author designs obfuscation to
Packing: to reduce the size of binaries and to createpolymorphic malware samplesBesides packing, the more advanced obfuscationtechniques are designed to slow down reverse engineeringefforts and to prevent:
the identification of API calls: identify the basic buildingblocks of the malwarethe control-flow reconstruction of the malware: follow andreconstruct the logic flowstatic analysis: determine the full functionality, triggers,hidden logic, time bombs, etc.
Background Malware Revese Engineering Summary
Why Code Obfuscation is not Easy
Malware authors can design binary code that is extremelydifficult to analyze. Using advanced programminglanguages knowledge, it is possible to create such code.Malware code should be able to run in a reliable manner.Obfuscation should not compromise this importantrequirement and should maintain the reliability of the initialcode. This requires a proof or guarantee of some sort.
Background Malware Revese Engineering Summary
Stuxnet: Keeping it “relatively” simple
Stuxnet does not use advanced binary obfuscationtechniques.The analysis of the code is challenging neverthelessStuxnet Code Characteristics:
Use of C++Use of C++ exception handlingUse of C++ classesUse of simple data encoding (encryption)Use of C structures for all data passed to the mainsubroutines:
Over 40 user-defined structuresNot recognized by disassemblers and decompilers
Background Malware Revese Engineering Summary
Summary
It is always desirable to recover from the malware adescription that is as close as possible to the original codeproduced by the authors.It is often possible to do that in practiceIt is often the only way to really determine the full capabilityof the malwareMalware analysis: deobfuscation + reverse engineeringUnpacking, Disassembling, Decompilation
Background Malware Revese Engineering Summary
Further Reading
PolyUnpack: Automating the Hidden-Code Extraction ofUnpack Malware. In ACSAC 2006OmniUnpack: Fast, Generic, and Safe Unpacking ofMalware. In ACSAC 2007Renovo: A hidden code extractor for packed executables.In WORM 2007Eureka: A Framework for Enabling Static MalwareAnalysis. In ESORICS 2008Justin: A Study of the Packer Problem and Its Solutions. InRAID 2008