Upload
osborne-bishop
View
315
Download
11
Embed Size (px)
Citation preview
IDA and obfuscated codeHex-RaysIlfak Guilfanov
2
Presentation Outline
Is obfuscated code a problem for IDA Pro?IDA Pro expects nice proper code
A lost battle?At the first sight, yes
Solutions existThey are numerous...
Future developmentYour feedback
Online copy of this presentation is available at http://www.hex-rays.com/idapro/ppt/caro_obfuscation.ppt
3
Sample obfuscated code
IDA is a static analysis tool and it makes many assumptions about the input codeWhen these assumptions are violated, the analysis goes wrongAn extremely simple case, call instructions are expected to return to the next instruction:
problem
The solution will be presented later...
4
Obfuscation categories
RedundancyBlow the code size: code cleaning is necessary
CamouflageHide & seek: the seeker is to win
Anti-debugger tricksTricks can be learned even by old dogs
Since it is “just” obfuscation, a determined reverse engineer will eventually overcome it
5
Redundancy
Instructions with no effectUseless jumpsComplex computations with a constant result Code duplication
6
Instructions with no effect
In fact CL is zero
7
Instructions with no effect - countermeasures
Replace them by 'nop'sCollapse regions of useless instructions into one line (select useless instructions, then View, Hide)
Ideally, a plugin to clean up the code would be nice. The Hex-Rays decompiler ignores useless instructions because it simply removes all dead code but it can not handle obfuscated code well – expect improvements in this direction
8
Useless jumps
Text view is pretty useless:
9
Useless jumps
Graph view is slightly better:
A plugin to clean the graph and combine adjacent nodes would be really useful (can be done without modifying the database)
10
Graph view and plugins
Graphs generated by IDA can be modified by a plugin on the fly – just hook to grcode_changed_graph eventThis allows for improving the graph. Some ideas:
Combine sequential nodes into oneHide dead code pathsRemove dead edgesAdd annotations to graph nodes/edgesAutomatically recognize and collapse patterns (e.g.strlen)Local optimization (within a node; constant folding, etc)
All this can be really useful for obfuscated code!
11
Constant result calculations
Some constant calculations can be easily handled
Ctrl-R
12
When there are too many offsets...
The answer is obvious – write a script or a plugin :)Here's very simple one-line script:OpOffEx(here, 1, REF_OFF32|REFINFO_NOBASE, -1, EBP, 0)
To make your life even easier, you may assign a script to a hotkey, press Shift-F2 and enter:
This trick and many others are explained on http://www.xs4all.nl/~itsme/projects/disassemblers/ida.html
AddHotkey("w", "make_ebp_offset");}
static make_ebp_offset(){ OpOffEx(here, 1, REF_OFF32|REFINFO_NOBASE, -1, EBP, 0);
13
What if there are thousands of such offsets?...
Improve the script to check all instructions for the desired pattern. Here's how to organize a loop over all instructions:
auto ea, ea2;ea2 = MaxEA();for ( ea=MinEA(); ea < ea2; ea=NextHead(ea, ea2) ){ if ( !isCode(GetFlags(ea)) ) continue; if ( GetMnem(ea) == "mov" && GetOpnd(ea, 0) == "ebp" ) Message("%a: found mov ebp!\n", ea);}
14
What if these offsets appear and vanish dynamically?
Well, then you have to create a plugin. It would:Recognize the desired patternModify the database (create an offset, code, add cmt, etc)
Such plugins are fully automaticThey hook to analysis events (frequently to custom_emu)This is the most powerful technique but, alas, it requires DLL programming in C and using the SDKJust three wishes for your plugins:
Maybe a switch to turn your plugin off is a good ideaTry to be user-friendly (for example, check if there is a comment before calling set_cmt; otherwise you may overwrite a user-defined comment)Do not exit to OS in the case of errors
15
Constant calculations – some ideas
Create a script or plugin to:Add calculation results as comments (what about a script that traces the application and adds register values as comments for each instruction?)Modify the database and simplify instructions
16
Camouflage
Opaque predicatesProprietary virtual machineEncryption/compressionMessage-driven systemsNo direct references – PIC (position independent code) codeHidden execution flow using SEHRootkit techniquesHidden entry point (TLS callbacks, entry point in the resources section or in the header)
17
Opaque predicates
The definition says that opaque predicate is a predicate (an expression that evaluates to either "true" or "false") for which the outcome is known by the programmer a priori, but which, for a variety of reasons, still needs to be evaluated at run timeIn fact, some expressions evaluate to any integer value:
GetLastError returns 0x57 (Invalid Parameter)
18
Opaque predicates
They may come in many varieties. Since we can not determine the outcome statically, we have to find it out ourselves and
Inform IDA about the predicate outcomePrune dead code paths and simplify the code
Working on graph view or pseudocode is easier
Automate this? How?
Future versions of IDA/Hex-Rays will offer some solutionsInteractivity and extendibility helps
19
Proprietary virtual machine
Many implementations use this obfuscation methodRequires reverse engineering the virtual machineExamples:
Themida & Code Virtualizer (http://www.oreans.com/)Various malware
In general case, building a processor module for the VM is requiredLet me show you a simple case
20
Bagle malware case
This mass mailer contains the following code sequence:
21
Bagle - opcodes
Opcode handlers are very simple, I renamed them:
22
Bagle – opcode table
After renaming all handlers the opcode table was:
23
Bagle – create opcode enumeration
The following script created a enumeration for all VM opcodes based on the handler names:
24
Bagle – enumeration ready
We can use this enumeration in the disassembly nowJust declare an array of bytes and convert them to VM_CODESAll this without quitting IDA (in fact, I was in the middle of a debugging session since there was another layer of protection before the VM)
25
Bagle – virtual machine readable
Create an array of bytes, declare them as VM_CODES:
26
Bagle – VM logic visible
The logic of the VM program became visible but there were immediate constants in the code that required manual intervention:
27
Bagle – VM decoding automated
The following script solve the problem:
28
Bagle – comfortable analysis of VM
After assigning a hotkey to the previous script, it was almost the same as having a processor module for the VMHowever, another level of deobfuscation is required(0x63FE34B2 ^ 0x9C01CB4D = 0xFFFFFFFF)
29
VM - summary
We have toAnalyze VM opcodesGive them meaningful, descriptive namesIn simple cases, simple enumeration will do the jobIn complex cases, a processor module has to be developed
It is not _that_ difficult after all ;)
Rolf Rolles created a processor module for a VM:http://www.openrce.org/articles/full_view/28
30
Executable packing
Plethora of packing methods, good and badManual unpacking is always possible; automatic unpacking would be idealThere are sample scripts and plugins in IDA
uunp – proof of concept unpacker plugin, exists as an IDC script as wellunpack – another sample unpacker
IDA stayed away from this arms raceThere are many other solutions available (unpackers, process dumpers, etc)
31
Executable packing - approaches
Static analysis too time consuming requires tedious manual work
Dynamic analysis (debugger)much faster requires special sandboxed environmentvulnerable to anti-debugger tricks
Code emulation a good idea any widespread emulator will be attackedemulation imperfections are a problem
No ideal solution...
32
Encryption
Methods vary from simple XOR encryption to serious encryption schemes like AES, Blowfish, etcSince the key must be present to run the executable, the strength of the encryption method does not matterIdeally we just let the application decrypt itself and then take a memory snapshotIf only part of the executable is decrypted at a time, then we need to automate the process of taking memory snapshots
33
Position independent code
No fixed addresses means no xrefsAnalysis is harder but user-defined offsets can help
34
Anti-debugging tricks
I'm sure you know better since you are the practitioners :)IDA related:
Its default settings are not good for hostile code debuggingExceptions are handled by the debugger – change it in the debugger settings
Just two simple methods
35
Use tracing to find anti-debugging tricks
Tracing is slow but it may be used to find why/when/how the process misbehavesSample trace log from a naïve code:
36
Simple method to neutralize found tricks
Use “conditional” breakpoint to neutralize tricks encountered while single-steppingThe breakpoint condition for the call instruction is
ip=ip+2Breakpoint conditions may call all defined IDC functions (including user-defined ones) – can be used for logging and changing the application behavior
37
Debugger – current state
IDA debugger advantagesThe annotated database is available during debuggingAll facilities continue to work: FLIRT signatures, function prototypes and argument names, structures, enumerations, your scripts and plugins, etc...ScriptableAvailable on multiple platforms (+remote debugging)
ShortcomingsSlow operationMultithreaded applications poorly handledOnly application level debugging is available
We continue to work on the shortcomingsFuture versions will be more fit for hostile code analysis
38
Debugger - ideas
A debugger plugin to configure the 'stealth' modeExceptions are passed to the applicationCalls to IsDebuggerPresent, NtSetInformationThread and similar functions are intercepted
Emulating debugger moduleA 'stealth' debugger module
Do not use the standard debugger interface (CreateProcess/WaitForDebugEvent)Inject a debugger DLL into the process and communicate with it (the must-have functionality is breakpoint handling and memory access)
Higher level debuggingSkip hidden code areas, group nodes in the graph viewSource level debugging using the pseudocode view
39
Summary
Obfuscation methods vary, no single receipt for all casesThe key is to be able to represent the code nicely on the screenThe problem is generic: what to do if IDA displays things not the way I want?The answer is: modify the output!
Use interactive commands, menus, etcRepresent data in meaningful wayHide irrelevant informationPatch the database and simplify it
Create scripts, plugins, processor modules to avoid routine work
40
The obfuscating call instruction
The function returns a few bytes further that it would normally:
41
Example: solution to obfuscating call
The idea: intercept emulation of calls to “ex_obfuscating” and create correct xrefs Just a few lines of code (unfortunately, a plugin)Can be made more complex if necessaryThe source code of the sample plugin can be found at http://www.hexblog.com/ida_pro/files/ex_deobfuscate.zipSee the next slide for the essential part of the plugin
42
Plugin to handle weird call instructions
43
Deobfuscated code
Note the arrow on the left side of the listingGraph could be simplified further by a plugin
44
The “thank you” slide
Thank you for your attention!Questions?