Practical Timing Side Channel Attacks Against Kernel Space ASLR

  • View
    60

  • Download
    2

Embed Size (px)

DESCRIPTION

Practical Timing Side Channel Attacks Against Kernel Space ASLR. Fall 2014 Instructor: Kun Sun, Ph.D. Outline. Background Side Channel Attack Kernel Space ASLR Use timing-based side channel to defeat kernel space ASLR (A heavy engineering work) - PowerPoint PPT Presentation

Text of Practical Timing Side Channel Attacks Against Kernel Space ASLR

Advanced Network Security

Practical Timing Side Channel Attacks Against Kernel Space ASLRFall 2014Instructor: Kun Sun, Ph.D.A little background about myself. 2

OutlineBackground Side Channel Attack Kernel Space ASLRUse timing-based side channel to defeat kernel space ASLR (A heavy engineering work)How to prevent/mitigate this side channel attack?3

OutlineBackground Side Channel Attack Kernel Space ASLRUse timing-based side channel to defeat kernel space ASLRHow to prevent/mitigate this side channel attack?Side ChannelSide channel vs. Covert channelSide channel: incidentally, unintentionally Covert channel: colluding, intentionallyA side channel is a communication channel based on the use of system resources not normally intended for communication between the subjects (processes) in the system

Side Channel TypesComputer systems abound with side channelsStorage Channelsuse system variables and attributes (other than time) to signal informationclassic example is resource exhaustion channelTiming Channelsvary the amount of time required to complete a task to signal informationclassic example is load sensing channelSide channels are typically noisy but information theory techniques can be used to achieve error-free communicationResource Exhaustion ChannelGiven 5MB pool of dynamically allocated MemoryHigh Processbit = 1 request 5MB of memorybit = 0 request 0MB of memoryLow Processrequest 5MB of memoryif allocated then bit = 0 otherwise bit = 1Load Sensing ChannelHigh Processbit = 1 enter computation intensive loopbit = 0 go to sleep

Low Processperform a task with known computational requirementsif completed quickly then bit = 0 otherwise bit =1Mitigating Side ChannelAudit for storage-basedTime normalization for timing-basedSlow it down, reduce the transmission date rate9

OutlineBackground Side Channel Attack Kernel Space ASLRUse timing-based side channel to defeat kernel space ASLRHow to prevent/mitigate this side channel attack?ASLR DeploymentWindows user space and kernel space ASLR since vista.Pax patches in Linux (not in kernel)MacOS since 10.5Mobile OSes including Android and iOS

User Space ASLR vs. Kernel Space ASLRShacham et al. showed that user mode ASLR on 32-bit architectures only leaves 16 bit of randomness, which is not enough to defeat brute-force attacks.However, such brute-force attacks are not applicable for kernel space ASLR.An attacker wants to exploit a vulnerability in kernel code, a wrong offset will cause a complete crash of the system and thus an attacker has only one attempt to perform an exploit.Windows Kernel Space ASLRFor both 32-bit and 64 bit Vista, Windows 7, and Windows 8.Obtain the info by reverse-engineeringDuring boot process, Windows loader allocates kernel_region for the kernel image and the hardware abstraction layer (HAL)

The entropy is 6 bit, 32*2=64=26

13

OutlineBackground Side Channel Attack Kernel Space ASLRUse timing-based side channel to defeat kernel space ASLRHow to prevent/mitigate this side channel attack?Timing-based Side Channel AttackTarget at how a local attacker with restricted rights can mount a timing-based side channel attack against the memory management system to deduce information about the privileged kernel space ASLR.Attack ModelSystem is protected by ASLR andLocal adversary can run user-space applications, but cannot execute privileged kernel mode code.

Challenges on Exploiting the Attack Timing attacks against a modern CPU are complicated.Performance optimizations including hardware prefetching, speculative execution, multi-core architectures, or branch prediction. Target on modern CPU and new OSPrevious work focused on older processors without such optimization.Instruction Set Architecture: Defined as part of the processor architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external IO.

Kernel related registers: Task register (TR), Interrupt Descriptor Table Register (IDTR), Global Descriptor Table Register (GDTR), Local Descriptor Table Register (LDTR)

TLB and CacheTranslation lookaside buffer: convert virtual address to physical addressVirtually indexed, physically tagged

http://www.cs.princeton.edu/courses/archive/fall04/cos471/lectures/19-VirtualMemory.pdfIntel i7 Memory Architecture plus Clock Latency

For Physical address extension (PAE) and 64 bit mode: page map level 4 (PML4) and Page directory point (PDP)General ApproachTwo steps: Set the system in a specific state from user mode. Measure the duration of a certain memory access operation. The time span of this operation then (possibly) reveals certain information about the kernel space layout.One example: Cache a system call handler by executing sys call. Access a designated set of user space addresses and execute the system call again. If the system call takes longer than expected, the access of user space addresses evicted the system call handler code and reveals address info of system code.Handling NoiseCircumvent performance optimizationsE.g., test memory in non-consecutive order to circumvent memory pre-fetching.OS Scheduler, shared cacheTwo steps requires 5000 clock cyclesOS time slides several msGuarantee to find cases with No Interrupts between two steps.

Three AttacksAttacks on L1/L2/L3 caches and TLB/PS cachesCache probingDouble page faultCache preloadingCache ProbingBased on multiple memory addresses mapped into the same cache set and compete for available slotsn-way set associative mode.all available slots are grouped into sets of the size n and each memory chunk can be stored in all slots of one particular set.E.g., in 32-bit address, a typical L3 cache of 8 MB is 16-way set associative. It consists of (8, 1921, 024)/64 = 131, 072 single slots that are grouped into 131, 072/16 = 8, 192 different sets.Reverse-engineering Sandybridge Hash Function for L3 parallel access among multiple coresDouble Page FaultBased on the observation on Intel CPUsWhen the page translation was successful, but the access permission check fails (e.g., when kernel space is accessed from user mode), a TLB entry is indeed created.Do not work on AMD CPUsCan be easily fixed by hardware redesign.The double page fault timings yield an estimation for the allocation map of the kernelspace, but do not determine at which concrete base addressesthe kernel and drivers are loaded to.Double Page FaultFirst page fault: for each kernel space page p, access p from user mode and a page fault will be handled by the operating system and forwarded to the exception handler of the process. When p refers to an allocated page, the MMU creates a TLB entry for p although the succeeding permission check fails. When p refers to an unallocated page, since the translation fails, the MMU does not create a TLB entry for p.Second page fault: accessing p again, if p refers to an allocated kernel page, the page fault will be delivered faster due to the inherent TLB hit.Cache PreloadingDouble page fault estimates the allocation map of the kernel space, but do not determine at which concrete base addresses the kernel and drivers are loaded to.3 steps in Cache Preloading: flush all caches (i.e., address translation and instruction/data caches) to start with a clean state. preload the address translation caches by indirectly calling into kernel code, for example by issuing a sysenter operation. Intentionally generate a page fault by jumping to some kernel space address and measure the time. If the faulting address lies in the same memory range as the preloaded kernel memory a shorter timeTestbedTested on 32-/64-bit systems running Windows 7/LinuxTested on different CPUs + VM:Intel i7-870 (Nehalem/Bloomfield, Quad-Core)Intel i7-950 (Nehalem/Lynnfield, Quad-Core)Intel i7-2600 (Sandybridge, Quad-Core)AMD Athlon II X3 455 (Triple-Core)VMWare Player 4.0.2 on Intel i7-870 (with VT-x)Results

28

OutlineBackground Side Channel Attack Kernel Space ASLRUse timing-based side channel to defeat kernel space ASLR (A heavy engineering work)How to prevent/mitigate this side channel attack?Mitigation ApproachesRoot cause: memory hierarchy is a shared resourceAlso, no enough randomization: 64 different slots for windows kernelOption one: split caches for user and kernel.Shortcoming: checking overhead, maximum cache size reducedOption two: suppress the creation of TLB entries on successful address translation if an access violation happensOption three: Timing normalization; modify the execution time of the page fault handler

Future worksTest MacOSTest AndroidObtain the physical address of a certain memory location from user mode.Questions?