Upload
tamas-k-lengyel
View
1.878
Download
1
Embed Size (px)
Citation preview
Malware Collection and Analysis via Hardware Virtualization
Tamas K Lengyel
Computer Science and Engineering
11/10/2015
Outline
1. Introduction and Problem statement2. Background, Challenges & Approach3. Limitations and scope4. Publications to date5. Malware collection system & results6. Malware analysis system & results7. Hardware and software limitations8. Contributions9. Future work
Introduction
• 1,000,000 new malware binaries a day
• Thwarting malware requires in-depth understanding of its operation• Collect and analyze malware
• Existing tools and techniques are impeded by modern malware techniques• Packing, evasion and metamorphism
• Hardware virtualization has been proposed to counter these techniques
Requirements
1. ScalabilityMaximizing the number of concurrently active collection and analysis sessions on limited hardware resources
2. StealthDetecting the monitoring environment should be prevented
3. FidelityThe collected data has to be accurate
4. IsolationMonitoring components have to be securely isolated and we need to prevent cross-contamination
Prominent prior work
• 2005: Vrable et al. - Scalability, fidelity, and containment in the potemkin virtual honeyfarm
• 2008: Payne et al. - Lares: An architecture for secure active monitoring using virtualization
• 2008: Dinaburg et al. - Ether: malware analysis via hardware virtualization extensions
• 2013: Deng et al. - Spider: Stealthy binary program instrumentation and debugging via hardware virtualization
Problem statement
Developing effective anti-malware technologies requires the collection and rapid analysis of an increasing number of malware samples such that all four requirements are met simultaneously.
No comprehensive evaluation to date has been performed to determine whether virtualization is an effective platform for the development of such tools.
Virtualization
Challenges
1. ScalabilityDisk and memory requirements are linear
2. StealthIn-guest tools can be detected
3. IsolationIn-guest tools can be disabledCross-contamination of VMs over the network
4. FidelityData collection is negatively impacted by 2 & 3
Our approach
1. Study current malware techniques
2. Develop out-of-guest tools
3. Conduct live experiments
4. Evaluate results
5. Study shortcomings and limitations
Limitations
1. Definition of malwareConstantly evolving and undefined set
2. Measurements and metricsRequirements are not always quantifiableResults are only indicative, not definitiveWe work to counter current malware techniques
3. Repeatability of experimentsExternal entities outside our control
Scope
• Malware analysis vs. malware detectionBlack Box AnalysisWe only aim at collecting relevant information which may aid malware detection
• Detection of virtualization vs. detection of monitoringVirtualization is already widely deployed
• Determining when we collected enough dataHalting problem
Publications
• CSET’12: Virtual Machine Introspection in a Hybrid Honeypot Architecture. Acceptance rate: 48%
• NSS’13: Towards Hybrid Honeynets via Virtual Machine Introspection and Cloning. Acceptance rate: 24%
• SHCIS’14: Multi-tiered Security Architecture for ARM via the Virtualization and Security Extensions
• MMF’14: Pitfalls of Virtual Machine Introspection on Modern Hardware.
• MMF’14: Code Validation for Modern OS Kernels• ACSAC’14: Scalability, Fidelity and Stealth in the DRAKVUF
Dynamic Malware Analysis system. Acceptance rate: 19.9%• SHCIS’15: Virtual Machine Introspection with Xen on ARM• C&TC’15: CloudIDEA: A Malware Defense Architecture for
Cloud Data Centers. Acceptance rate: 38%
Malware collection
Primary requirement: capture malware binaries
• Scalability: Deploy copy-on-write disk and memory sharing
• Stealth: No in-guest agents, no modification to the hypervisor
• Isolation: External agent + network isolation
• Fidelity: Kernel heap pool-tag scanning
Network Isolation
Fidelity via pool tag scanning
struct {union {
struct { uint16_t previous_size:9; uint16_t pool_index :7; uint16_t block_size :9; uint16_t pool_type :7; }; uint16_t flags;
};uint32_t pool_tag;
} _POOL_HEADER
Captured malware samples
Results: scalability
Malware analysis
Primary requirement: capture useful live data
• Scalability: Re-use CoW techniques from prior experiments
• Stealth: No in-guest agents, no modification to the hypervisor, command injection with VMI
• Isolation: VLAN tagging, TCB disaggregation
• Fidelity: Syscalls and kernel heap-allocations
Useful data?
Goal is to generate data that is complete in order to be useful for analysis
Data-collection should be flexible to allow tuning to specific requirements
Two main objectives defined in prior art:1. Syscall monitoring2. Kernel heap monitoring
We also will monitor deleted files as we deemed that an interesting and useful addition
System design
Syscall trapping
Stealthy breakpoint injection method:
1. Overwrite internal kernel function entry points with #BP (0xCC)
2. Read/write protect page with EPT3. When traps hit, place back original byte4. Singlestep 1 instruction5. Place breakpoint back again
Can monitor all internal kernel functions, not just system calls!
Heap-allocation trapping
Command injection
Syscalls of 115k malware
Heap allocs of 115k malware
Files deleted
File size 100KByte+
Stalling malware
Standard methods• Detection of virtualized environments• Detection of in-guest artifacts• Sleeping
Advanced methods• Time-skew detection• API spamming
API spamming
• Repeatedly call monitored APIs which normally complete fast• NtCreateSemaphore
• Logging these calls will take more time• Spamming these times-out the monitoring
Use of NtCreateSemaphore in 60s:Observed in: 45,383 samples. Average: 7.77
Samples significantly above average: 1Number of calls: 17,453
Summary
Hardware virtualization is effective for both malware collection and analysis
All four requirements can be met simultaneously using hardware virtualization
The technology is sufficiently flexible to develop and fine-tune data collection techniques
Major improvement in the arms-race against malware
Software limitations
Race-condition with multiple vCPUs
Hardware limitations on x86
EPT only reports violation start addressRead/write operation may be up to 8 bytes long
Hardware limitations on x86
sTLB makes TLB-splitting attacks no longer feasible
TLB can still be used to hide mappings from VMI
Hardware limitations on ARM
Split-TLB architecture without sTLB
Hardware-assisted translation available from the VMM
Translation is performed as data-fetch access• Only hits the dTLB
Hiding code-pages on ARM is possible via split-TLB attacks
Contributions
1. Identified core requirements that must be met simultaneously
2. Developed and open-sourced the prototypes, with major contributions to existing systems
3. Performed extensive tests with modern malware
4. Identified hardware and software limitations that must be addressed when building such systems
Future work
• Keeping up with the evolving threat landscape• Attacks against the hypervisor and lower layers• Data-only malware• Stalling malware
• Making use of new and evolving hardware virtualization extensions• Hybrid VMI
• Data-mining the collected information• Identifying malware groups• Creating IDS/IPS rules
Questions?
• Dissertation text available athttp://tklengyel.com/thesis.pdf
• DRAKVUFhttp://drakvuf.com
• LibVMIhttp://libvmi.com