12
Watchdog: Hardware for Safe and Secure Manual Memory Management and Full Memory Safety Santosh Nagarakatte Milo M. K. Martin Steve Zdancewic Computer and Information Science, University of Pennsylvania [email protected] [email protected] [email protected] Abstract Languages such as C and C++ use unsafe manual memory management, allowing simple bugs (i.e., accesses to an object after deallocation) to become the root cause of exploitable security vulnerabilities. This paper proposes Watchdog, a hardware-based approach for ensuring safe and secure manual memory management. Inspired by prior software-only proposals, Watchdog generates a unique identifier for each memory allocation, associates these identifiers with pointers, and checks to ensure that the identifier is still valid on every memory access. This use of identifiers and checks enables Watchdog to detect errors even in the presence of reallocations. Watchdog stores these pointer identifiers in a disjoint shadow space to provide comprehensive protection and ensure compatibility with existing code. To streamline the implementation and reduce runtime overhead: Watchdog (1) uses micro-ops to access metadata and perform checks, (2) eliminates metadata copies among registers via modified register renaming, and (3) uses a dedicated metadata cache to reduce checking overhead. Furthermore, this paper extends Watchdog’s mechanisms to detect bounds errors, thereby providing full hardware-enforced memory safety at low overheads. 1. Introduction Languages such as C and C++ are the gold standard for implementing low-level systems software such as operat- ing systems, virtual machine monitors, language runtimes, embedded software, and performance critical software of all kinds. Although safer languages exist (and are widely used in some domains), these low-level languages persist partly because they provide low-level control over memory layout and use explicit manual memory management (e.g., malloc and free). Unfortunately, the unsafe attributes of these lan- guages result in nefarious bugs that cause memory corrup- tion, program crashes, and serious security vulnerabilities. This paper focuses primarily on eliminating memory cor- ruption and security vulnerabilities caused by one specific aspect of low-level languages: memory deallocation errors, which directly result in dangling pointer dereference bugs This research was funded in part by the U.S. Government. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. This research was funded in part by DARPA contract HR0011-10-9-0008, ONR award N000141110596, and NSF grants CNS-1116682, CCF-1065166, and CCF-0810947. int *p, *q, *r; p = malloc(8); ... q = p; ... free(p); r = malloc(8); ... ... = *q; int* q; void foo() { int a; q = &a; } int main() { foo(); ... = *q; } Heap based Stack based Figure 1. Dangling pointer errors involving the heap and stack. On the left, freeing p causes q to become a dangling pointer. The memory pointed to by q could be reallocated by any subsequent call to malloc(). On the right, foo() assigns the address of stack-allocated variable a to global variable q. After foo() returns, its stack frame is popped, and q points to a stale region of the stack, which any inter- vening function call could alter. In both cases, dereferencing q can result in garbage values or data corruption. and use-after-free (UAF) security vulnerabilities. In essence, we seek to enforce safe manual memory management and thereby eliminate an entire class of security vulnerabilities and uncaught memory corruption bugs. Figure 1 shows two use-after-free errors, including a dangling reference to a re- allocated heap location (left) and a dangling pointer to the stack (right). Use-after-free (UAF) errors have proven to be just as severe and exploitable as buffer overflow errors: they too po- tentially allow an attacker to corrupt values in memory [6], inject malicious code, and initiate return-to-libc attacks [31]. Use-after-free vulnerabilities have been used in real-world attacks, and many such vulnerabilities in mainstream software include CVE-2010-0249 in Microsoft’s Internet Explorer, CVE-2009-1690 in Apple’s Safari and Google’s Chrome browsers, and CVE-2011-2373 in Mozilla’s Fire- fox browser. The number of use-after-free vulnerabilities reported has been increasing in recent years, in contrast with the decline in buffer overflow vulnerabilities reported [1]. The fundamental cause of use-after-free vulnerabilities is that programming languages providing manual memory management (such as C/C++) do not ensure that program- mers use those features safely. Unfortunately, the commonly advocated alternative — developing software in garbage- 1

Watchdog: Hardware for Safe and Secure Manual Memory ... · Watchdog: Hardware for Safe and Secure Manual Memory Management and Full Memory Safety Santosh Nagarakatte Milo M. K. …

Embed Size (px)

Citation preview

Watchdog: Hardware for Safe and Secure ManualMemory Management and Full Memory Safety

Santosh Nagarakatte Milo M. K. Martin Steve ZdancewicComputer and Information Science, University of Pennsylvania

[email protected] [email protected] [email protected]

AbstractLanguages such as C and C++ use unsafe manual memorymanagement, allowing simple bugs (i.e., accesses to anobject after deallocation) to become the root cause ofexploitable security vulnerabilities. This paper proposesWatchdog, a hardware-based approach for ensuring safeand secure manual memory management. Inspired by priorsoftware-only proposals, Watchdog generates a uniqueidentifier for each memory allocation, associates theseidentifiers with pointers, and checks to ensure that theidentifier is still valid on every memory access. This use ofidentifiers and checks enables Watchdog to detect errorseven in the presence of reallocations. Watchdog stores thesepointer identifiers in a disjoint shadow space to providecomprehensive protection and ensure compatibility withexisting code. To streamline the implementation and reduceruntime overhead: Watchdog (1) uses micro-ops to accessmetadata and perform checks, (2) eliminates metadatacopies among registers via modified register renaming, and(3) uses a dedicated metadata cache to reduce checkingoverhead. Furthermore, this paper extends Watchdog’smechanisms to detect bounds errors, thereby providing fullhardware-enforced memory safety at low overheads.

1. IntroductionLanguages such as C and C++ are the gold standard forimplementing low-level systems software such as operat-ing systems, virtual machine monitors, language runtimes,embedded software, and performance critical software of allkinds. Although safer languages exist (and are widely usedin some domains), these low-level languages persist partlybecause they provide low-level control over memory layoutand use explicit manual memory management (e.g., mallocand free). Unfortunately, the unsafe attributes of these lan-guages result in nefarious bugs that cause memory corrup-tion, program crashes, and serious security vulnerabilities.

This paper focuses primarily on eliminating memory cor-ruption and security vulnerabilities caused by one specificaspect of low-level languages: memory deallocation errors,which directly result in dangling pointer dereference bugs

This research was funded in part by the U.S. Government. The views and conclusionscontained in this document are those of the authors and should not be interpreted asrepresenting the official policies, either expressed or implied, of the U.S. Government.This research was funded in part by DARPA contract HR0011-10-9-0008, ONR awardN000141110596, and NSF grants CNS-1116682, CCF-1065166, and CCF-0810947.

int *p, *q, *r;p = malloc(8);...q = p;...free(p);r = malloc(8);...... = *q;

int* q;void foo() { int a; q = &a;}int main() { foo(); ... = *q;}

Heap based Stack based

Figure 1. Dangling pointer errors involving the heap andstack. On the left, freeing p causes q to become a danglingpointer. The memory pointed to by q could be reallocatedby any subsequent call to malloc(). On the right, foo()assigns the address of stack-allocated variable a to globalvariable q. After foo() returns, its stack frame is popped,and q points to a stale region of the stack, which any inter-vening function call could alter. In both cases, dereferencingq can result in garbage values or data corruption.

and use-after-free (UAF) security vulnerabilities. In essence,we seek to enforce safe manual memory management andthereby eliminate an entire class of security vulnerabilitiesand uncaught memory corruption bugs. Figure 1 shows twouse-after-free errors, including a dangling reference to a re-allocated heap location (left) and a dangling pointer to thestack (right).

Use-after-free (UAF) errors have proven to be just assevere and exploitable as buffer overflow errors: they too po-tentially allow an attacker to corrupt values in memory [6],inject malicious code, and initiate return-to-libc attacks [31].Use-after-free vulnerabilities have been used in real-worldattacks, and many such vulnerabilities in mainstreamsoftware include CVE-2010-0249 in Microsoft’s InternetExplorer, CVE-2009-1690 in Apple’s Safari and Google’sChrome browsers, and CVE-2011-2373 in Mozilla’s Fire-fox browser. The number of use-after-free vulnerabilitiesreported has been increasing in recent years, in contrast withthe decline in buffer overflow vulnerabilities reported [1].

The fundamental cause of use-after-free vulnerabilitiesis that programming languages providing manual memorymanagement (such as C/C++) do not ensure that program-mers use those features safely. Unfortunately, the commonlyadvocated alternative — developing software in garbage-

1

Santosh Nagarakatte
In the Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012)
Santosh Nagarakatte
Santosh Nagarakatte
Santosh Nagarakatte
Santosh Nagarakatte

collected languages — is not always easily applicable in alldomains (such as for operating systems or high-performancereal-time code) where real-time response or low-level con-trol over memory is required. Porting legacy code to thesesafer languages is not always cost effective, and using con-servative garbage collection [4] is not always feasible dueto pause times and observed memory leaks in long-runningsoftware [32].

Given their importance, there has been increased effortin detecting these use-after-free errors, but significant chal-lenges remain. One approach is to randomize the location ofheap objects [19, 28] to probabilistically transform danglingpointer dereferences into hard-to-exploit non-deterministicerrors. Although useful for mitigating some use-after-freeexploits, such approaches fail to detect use-after-free errorson the stack and suffer from locality-destroying memoryfragmentation. Other approaches seek to detect errorsby tracking the allocation/deallocation status of regionsof memory (via shadow space in software [26], withhardware [5, 34], or page-granularity tracking via virtualmemory mechanisms [10, 20]). Alternative approachestrack and check unique identifiers either completely withsoftware [3, 23, 29, 35] or with hardware acceleration [7]. Ingeneral, prior approaches generally suffer from one or moreof the following issues: failure to detect all use-after-freeerrors (due to reallocation of memory), incompatibilities(due to memory layout changes), require significant changesto the tool chain, high memory overheads, or high perfor-mance overheads. Such overheads typically limit the use ofsoftware-only approaches to only debugging contexts ratherthan being used all the time in deployed code. Given thatmanual memory management in low-level languages likeC is still ubiquitous, we seek a highly compatible means ofeliminating use-after-free vulnerabilities completely withlow enough overheads to be used all the time.

This paper proposes Watchdog, a hardware proposal thatenforces safe and secure manual memory management evenin the presence of reallocations. Watchdog builds upon priorproposals for identifier-based use-after-free checking [3, 7,23, 29, 35] using disjoint metadata [23]. The Watchdog hard-ware performs the majority of the work, relying on the soft-ware runtime to inform it about memory allocations anddeallocations. Watchdog associates a unique identifier withevery memory allocation, and it marks the identifier as in-valid on memory deallocations. Every pointer has an associ-ated identifier that is propagated on pointer operations. Forpointers resident in memory, these identifiers are maintainedin a disjoint metadata space. Watchdog performs use-after-free checks to ascertain that the identifier associated with thepointer is still valid before every memory access. As identi-fiers are never reused (they are 64-bit unique values), Watch-dog detects all use-after-free errors even in the presence ofreallocations.

The Watchdog hardware employs several optimizationsto reduce the performance penalties of identifier-basedchecking. First, Watchdog uses micro-op (µop) injection [8]

to perform metadata propagation and checking. Second,Watchdog uses an identifier metadata encoding that reducesthe validity check to a single load-and-compare µop.Third, Watchdog identifies memory operations that loador store pointers conservatively in unmodified binaries (byassuming any 64-bit integer may contain a pointer) or moreprecisely using ISA-assisted identification, which allows thecompiler to annotate loads and stores of pointers. Fourth,the hardware uses copy elimination via register remappingto eliminate metadata copies within the core. Performancemeasurements of a simulated x86 processor on twentybenchmarks indicate that the runtime overhead is 15% onan average for detecting all use-after-free errors.

Furthermore, this paper shows that the basic mechanismsin Watchdog are easily extended to include pointer-basedbounds checking in hardware [9], further supporting the ap-plicability and utility of the approach. Many of the same op-timizations apply directly, resulting in a system with 24%average performance overhead that enforces full memorysafety and thus provides comprehensive prevention of alluse-after-free and buffer overflow vulnerabilities.

2. BackgroundThis section provides background on techniques for de-tecting use-after-free errors that are most closely related toWatchdog; other approaches are discussed in Section 10.Although a long standing problem, there are no hardwareproposals that detect all use-after-free errors (i.e., even inthe presence of reallocations) while maintaining memorylayout compatibility with existing code.

Existing use-after-free checking approaches can bebroadly classified into two categories based on the track-ing and checking scheme used: (1) location based and(2) identifier based. As described in the subsections below,location-based approaches fail to detect use-after-free errorsto re-allocated memory, and thus lack the comprehensive de-tection provided by identifier-based approaches. In contrast,identifier-based approaches typically use inline metadata,which introduces incompatibilities due to memory layoutchanges. Watchdog is the first hardware implementation ofidentifier-based checking with disjoint metadata, allowingit to avoid the incompatibility of inline metadata whileretaining comprehensiveness.

2.1 Location-Based CheckingLocation-based approaches (e.g., [17, 25, 34]) use the loca-tion (address) of the object to determine whether it is allo-cated or not. An auxiliary data structure records the allocat-ed/deallocated status of each location, and it is updated onmemory allocations (e.g. malloc()) and deallocations (e.g.free()). On each memory access, these auxiliary structuresare consulted to determine whether the dereferenced addressis currently valid (allocated) memory. Common implemen-tations of these object-location based approaches use eithera splay tree [11, 17] or a shadow space [2, 25, 34] as theauxiliary data structure to provide efficient lookups.

2

These techniques are good debugging aids and detectmany common errors. However, their inability to detect er-rors in the presence of reallocations fundamentally limitsthe efficacy of these approaches. Whenever a location is re-allocated this approach erroneously allows the dereferenceof a dangling pointer reference to proceed. The informationtracked by this approach is insufficient to determine that thepointer’s original allocation area was freed and is now beingused again for another potentially unrelated purpose.

2.2 Identifier-Based CheckingAn alternative approach is the allocation identifier ap-proach [3, 7, 23, 29, 35], which associates a uniqueidentifier with each memory allocation. Whenever memoryis deallocated (by a call to free() or popping a call stackframe), the unique identifier associated with that allocationis marked as invalid. On a memory access, the systemchecks that the unique allocation identifier associated withpointer used to make the access is still valid. Identifiers arenever reused, and thus any memory access to a deallocatedobject will find that the identifier is no longer valid.

The per-allocation identifier must be tracked so that itcan be checked on every memory access. Thus, each pointerhas an associated identifier, and this identifier is propagatedas the pointer is copied or pointer arithmetic is performed.This metadata is tracked both for pointers in registers andpointers in memory. In memory, this metadata is typicallymaintained inline (i.e., a fat pointer) [3, 7, 29, 35]. Suchinline metadata can introduce code incompatibilities due tomemory layout changes. In addition, arbitrary casts in theprogram can corrupt inline metadata resulting in uncaughterrors. To avoid the problems of inline metadata, a recentlyproposed software-only identifier-based approach employsdisjoint metadata [23]. The key advantage of identifier basedapproaches is the ability to detect all dangling pointer errors.However, as they require a lookup on every memory access,they can have significant performance overheads.

2.3 Implementation Approaches and ComparisonTable 1 provides the comparison of various schemes un-der the common umbrella of location based and identifierbased approaches. Use-after-free checking can be imple-mented: (1) by source-to-source translation, (2) during com-pilation, (3) via binary rewriting, (4) completely in hard-ware, or (5) various hybrids of these approaches. One advan-tage of source-to-source, compiler-based or hybrid instru-mentation is that the identity of instructions manipulatingpointers is known. Thus, identifier based approaches, whichmaintain identifier information with each pointer, can bemore efficiently implemented on the source code or withinthe compiler. In contrast, binary translation and hardware-based approaches generally do not know a priori which in-structions manipulate pointers. Hence, such approaches haveprimarily used location-based checking, resulting in the in-ability to detect errors in the presence of reallocations andthus limiting their effectiveness in preventing security vul-

Approach

Instrument.

Runtime

Metadata

Cas

ts

Com

pre.

Loca

tion

MC [26] Binary 10x

Disjoint Y NJK [17] CompilerLBA [5]

H/W 1.2xSProc [13]MTrac [34]

Iden

tifier

SafeC [3]Source

10x Inline

NY

P&F [29] 5x SplitMSCC [35] 2xChuang [7] Hybrid 1.2x InlineCETS [23] Compiler 2x Disjoint YWatchdog H/W 1.2x

Table 1. Comparison of representative location-based andidentifier-based approaches with instrumentation method,runtime overhead, metadata organization, safety even witharbitrary type casts, and comprehensive (Compre.) error de-tection in the presence of memory reallocations.

nerabilities. Chuang et al. [7] proposed a compiler-basedidentifier-based checker with hardware acceleration and in-line metadata. Its inline metadata changes the layout of ob-jects in memory (which can cause code incompatibilities)and allows arbitrary type casts to corrupt metadata (whichcan compromise comprehensive detection).

3. Watchdog ApproachThe goal of Watchdog is to enforce safe and secure manualmemory management by providing comprehensive detec-tion—detecting all memory management errors even in thepresence of reallocations—and doing so while keeping theoverheads of checking every memory access low enough tobe widely deployed in live systems. To provide comprehen-sive detection, Watchdog employs identifier-based checkingof use-after-free errors almost entirely in hardware, relyingon the software runtime only to provide information aboutmemory allocations and deallocations. To localize the hard-ware changes, this checking is implemented by augmentinginstruction execution by injecting extra µops. Furthermore,Watchdog aims to provide source compatibility (i.e., fewsource code changes) and binary compatibility (i.e., libraryinterfaces unchanged) by leaving the data layout unchangedusing a disjoint shadow space for metadata.

3.1 Operation OverviewIn Watchdog, the hardware is responsible for identifierpropagation and checking. Once the memory allocations areidentified with the help of a modified malloc() and free()runtime library, a unique identifier is provided to everymemory allocation and associated with the pointer pointingto the allocated memory. As pointers can be resident inany register, conceptually Watchdog extends every registerwith a sidecar identifier register. Pointers can also residein memory, so Watchdog provides a shadow memory thatshadows every word of memory with an identifier for point-

3

ld R1 <- memory[R2]

(a) Load

check R2.id R1. id <- shadow[R2.val].id R1.val <-memory[R2.val].val

st memory[R2]<- R1

(b) Store

check R2.id shadow[R2.val].id <- R1.id memory[R2.val].val <- R1.val

add R1 <- R2, R3

if (R2. id != INVALID) R1.id <- R2.id else R1.id <- R3.id R1.val <- R2.val + R3.val

add R1 <- R2,imm

(c) Add immediate

R1. id <- R2.id R1.val <- R2.val + imm

(d) Add

Figure 2. Identifier (id) metadata checking and propagation through load, store, add immediate and add.

ers. To propagate and check the metadata, Watchdog injectsµops in hardware. On memory deallocations, the identifierassociated with the pointer pointing to the memory beingdeallocated is marked as invalid. On every memory access,Watchdog checks to ascertain if the identifier associatedwith the pointer being dereferenced is still valid. Accessinga memory location with an invalid identifier results in anexception. The following subsections explain each of theseoperations performed by Watchdog.

3.2 Checks on Memory AccessesThe hardware performs identifier validity checking byinserting a check µop before every memory access. Thecheck µop uses the sidecar identifier metadata associatedwith the pointer register being dereferenced and checkswhether the retrieved identifier is in the set of valid identi-fiers. A check failure triggers an exception, which can behandled by the operating system by aborting the program orby invoking some user-level exception handling mechanism.Performing an expensive check on every memory accesscould result in large performance overheads, so Section 4describes the techniques we use to avoid expensive va-lidity checks by reorganizing the identifier metadata andemploying hardware caching.

3.3 In-Memory Pointer MetadataAs pointers can be resident in memory, the identifier meta-data also needs to be maintained with pointers in memory. Tomaintain memory layout compatibility, the hardware main-tains the per-pointer metadata in the shadow memory. Con-ceptually, every word in memory has identifier metadata inthe shadow memory. When a pointer is read from memory,the metadata associated with the pointer being read is alsoread from the shadow memory. To implement this behav-ior (see Figure 2a), for every load instruction the Watchdoghardware injects (1) a check µop to perform the check, (2)a µop to perform the load of the actual value into the registerand (3) a shadow_load µop to load the identifier (ID) meta-data from the shadow memory space. Stores are handledanalogously (also shown in Figure 2b). We assume pointersare word aligned (as is required by some ISAs and is gener-ally true with modern compilers even for x86), which allowsthe the shadow load/store µops to accesses the shadow spacevia an aligned load/store in a single cache access.

The shadow space is placed in a dedicated region of thevirtual address space that mirrors the normal data space.

Placing the shadow space into the program’s virtual addressspace allows shadow accesses to be handled as normal mem-ory accesses using the usual address translation and page al-location mechanisms of the operating system. Current 64-bitx86 systems support 48-bit virtual addresses, so the the hard-ware uses a few high-order bits from the available virtual ad-dress space to position the shadow space. This organizationallows the shadow load/shadow store µop to convert an ad-dress to a shadowspace address via simple bit selection andconcatenation.

Accessing the shadow space on every memory operationwould result in significant performance penalties, so Sec-tion 5 describes the mechanisms we use to reduce the num-ber of metadata accesses by inserting metadata load/storeµops only for those memory operations that might actuallyload or store pointer values.

3.4 In-Register MetadataTo ensure that the checks inserted before a memory accesshave the correct identifier metadata, the identifiers must bepropagated with all pointer operations acting on values inregisters (pointer copies and pointer arithmetic). For exam-ple, when an offset is added or subtracted from a pointer,the destination register inherits the identifier of the origi-nal pointer. Figure 2 shows the identifier propagation withaddition operations as a result of pointer arithmetic. Suchregister manipulation instructions are extremely common, socopying the metadata on each operation (say, via an insertedµop) would be extremely costly. Instead, Section 6 describesWatchdog’s use of copy elimination via register renaming toreduce the number of propagation µops inserted.

3.5 Watchdog Usage ModelWatchdog would likely first be deployed on an opt-in basissimilar to the deployment scenario with Microsoft’s page-based no-execute Data Execution Prevention (DEP) feature.Initially, DEP was not enabled by default as it can breaksome programs that use self-modifying code or JIT-basedcode generation. Similarly, Watchdog would not be enabledfor programs that might violate Watchdog’s assumptions un-til after developers have explicitly tested their code with it.Like DEP, which is now enabled by default for most newsoftware, we anticipate similar adoption for Watchdog.

This section has described the basic approach and hasoutlined three implementation optimizations to make it effi-cient: implementing efficient checks, identifying pointer ac-

4

p = malloc(size)

(a) Heap allocation(runtime)

key = unique_identifier++; lock = allocate_new_lock(); *(lock) = key; id = (key, lock); q = setident(p, id);

free(p)

id = getident(p); *(id.lock) = INVALID; add_to_free_list(id.lock)

call

stack_key = stack_key + 1 stack_lock = stack_lock + 8 memory[stack_lock] = stack_key %rsp.id = (stack_key, stack_lock)

(c) Stack allocation(hardware)

(b) Heap deallocation(runtime)

return

memory[stack_lock] = INVALID stack_lock = stack_lock - 8 current_key = memory[stack_lock] %rsp.id = (current_key, stack_lock)

(d) Stack deallocation(hardware)

Figure 3. Identifier (id) allocation and deallocation with malloc/free and call/return

cesses, and register renaming techniques to avoid unneces-sary µops. The next three sections describe these design op-timizations, respectively.

4. Efficient Use-After-Free CheckingWatchdog performs a check as part of every load and store.Memory operations are common, so this operation must belightweight to achieve low performance overheads. This sec-tion describes: (1) the engineering of the identifiers to makelookups cheap and (2) using a hardware cache to further ac-celerate these checks.

4.1 “Lock and Key” LookupsTo accelerate checks, Watchdog uses a hardware implemen-tation of the lock and key checking approach previously usedby some software-only checkers [23, 29, 35]. The lock andkey identifier technique transforms a validity check into justa load and comparison by splitting the identifier into twosub-components: a key (a 64-bit unsigned integer) and alock (a 64-bit address which points to a location in mem-ory). The memory location pointed to by the lock is calledthe lock location. The system maintains the invariant that ifthe identifier is valid, the value contained in the lock loca-tion is equal to the key’s value. With a lock and key iden-tifier, a check is a single memory access plus an equalitycomparison. This check works because: (1) the key is writ-ten into the lock location at allocation time, (2) the contentsof the lock location is changed upon deallocation, and (3)keys are unique and thus no subsequent allocation will everreset the lock location to value that would cause a spuriousmatch (even if the underlying memory or the lock locationsare subsequently reused).

Memory allocation/deallocation occurs when (1) the run-time performs such operations on the heap and (2) new stackframes are created/deleted on function entry and exits. Cor-respondingly, Watchdog allocates/deallocates identifiers onthese operations. Heap allocation is fairly uncommon com-pared to function calls, so Watchdog relies on the runtimesoftware to perform identifier management for the heap.In contrast, the Watchdog hardware performs the identifiermanagement for function calls/returns.

On each heap memory allocation, the software runtimeallocates both a unique 64-bit key and a new lock locationfrom a list of free locations, and the runtime writes the keyvalue into the lock location. The runtime conveys the iden-

tifier to the hardware using the setident instruction thattakes two register inputs: (1) a pointer to the start of thememory being allocated and (2) the unique lock and keyidentifier (128-bit) being assigned as shown in Figure 3a.On memory deallocations, the runtime obtains the identifierassociated with the pointer being freed using the getidentinstruction that takes the pointer being freed as the regis-ter input as shown in Figure 3b. The runtime then uses theidentifier metadata to write an INVALID value to the lock lo-cation. The runtime then returns the lock location to the freelist. To prevent double-frees and calling free() on memorynot allocated with malloc(), the runtime also checks thatthe pointer’s identifier is valid as part of the free() opera-tion.

To perform identifier management for stack frames oncalls and returns, the hardware injects µops to maintain an in-memory stack of lock locations whose top of stack is storedin a new stack_lock control register. The next key to beallocated is maintained with a separate stack_key controlregister. On a function call, the hardware injects four µops to:allocate a new key, push that key onto the in-memory locklocation stack, and associate the new key and lock locationwith the stack pointer (see Figure 3c). On function return,the identifier associated with the stack pointer is restoredto the identifier of the current stack frame. This operationis accomplished by reading the value of the key from thememory location pointed by the stack lock register after thestack manipulation (also 4 µops, as shown in Figure 3d).

Figure 4a illustrates the lock and key identifier schemewhere two pointers p and q point to the different parts of thesame object. Hence both p and q have the same ID in shadowmemory with key being 5 and lock being address 0xB0. Thelock location at address 0xB0 pointed by the lock part of theID also has 5 as the key indicating that the object pointed bythe pointers p and q are still allocated and valid.

Watchdog inserts a check µop before each memory ac-cess. This single µop reads the metadata from a register(which contains both the lock and key), loads the value cur-rently at the lock location, and then compares it to the key.If the value loaded does not match the key, the memory as-sociated with this pointer was previously deallocated, thusthe access is invalid and the hardware raises an exception.Figure 4b illustrates the validity check that consists of a sin-gle memory access plus an equality comparison rather thansome more expensive hash table or tree lookup operation.

5

0xB0p:0x50 5

0xB0q:0x70 5

per pointer idptrs

5 locklocations

0xB0

Key Lock

0x50

0x70

a Lock and key identifier

load R1 <- memory[R2] or

store memory[R2] <- R1

if (R2.id.key != memory[R2.id.lock]) then dangling ptr exception

b Watchdog check

L2 Cache

L3 Cache/Memory

Data Cache

Lock location cache

Core

TLB Instruction

Cache

TLB TLB

c Lock location cache

Figure 4. (a) Watchdog’s lock and key identifier. Here two pointers p and q point to different part of the same object and hencehave the same identifier. (b) the placement of the lock location cache (shaded). (c) Watchdog check before a load and store is amemory access and a equality comparison with lock and key identifier.

4.2 Lock Location CacheEven with this efficient validity check, each check µop stillperforms a memory access, which increases the demandplaced on the cache ports. To mitigate this impact, Watch-dog optionally adds a lock location cache to the core, whichis accessed by the check µop and is dedicated exclusivelyfor lock locations. Just as splitting the instruction and datacaches increases the effective cache bandwidth (by sepa-rating instruction fetches from loads/stores), this additionalcache is used to provide more bandwidth for accessing locklocations. This cache becomes a peer with the instructionand data caches (as shown in Figure 4c), has its own (small)TLB, and uses the same tagging, block size, and state bitsused to maintain coherence among the caches. Memory al-locations and deallocations update lock location values, sothese operations also access the lock location cache. Evena small lock location cache (e.g., 4KB) can be effective be-cause (1) lock locations (8 bytes per object currently allo-cated) are small relative to the average object size and (2) thelock locations region has little fragmentation and exhibitsreasonable spatial locality because lock locations are reallo-cated using a LIFO free list. Cache misses are handled justlike misses in the data cache.

5. Identifying Pointer Load/Store OperationsWatchdog conceptually maintains a lock and key identifierwith every pointer in a register or memory. However, bina-ries for standard ISAs do not provide explicit informationabout which operations manipulate pointers. In the absenceof such information, propagating metadata with every regis-ter and memory operation would require many extra memoryoperations, resulting in substantial performance degradation.This section describes two techniques for identifying pointeroperations, the results of which can be used to reduce thenumber of accesses to the metadata space.

5.1 Conservative Pointer IdentificationTo enable Watchdog to work with reasonable overhead with-out significant changes to program binaries, we observe that,

for current ISAs and compilers, pointers are generally word-sized, aligned, and resident in integer registers. Based on thisobservation, Watchdog conservatively assumes that only a64-bit load/store to an integer register may be a pointer oper-ation, whereas floating point load/stores and sub-word mem-ory accesses are non-pointer operations. Watchdog does notinsert additional metadata manipulation µops for such non-pointer operations.1 As evidence of the effectiveness of thisheuristic, the left bars in Figure 5 show that this approachclassifies 31% of memory operations as potentially load-ing/storing a pointer.

5.2 ISA-Assisted Pointer IdentificationAlthough the above conservative heuristic is effective, wealso explore more precise identification by extending theISA with load and store variants that indicate whether apointer is being loaded or stored. Using these annotations,the compiler, which generally knows which operations aremanipulating pointers, is responsible for conservativelyselecting the proper load/store variants. To experimentallystudy the potential benefits of this approach without per-forming significant modifications to a compiler backend, weused a profiling pass to determine which static instructionsever load or store valid pointer metadata. The profilingpass monitored every access to the metadata space andidentified those static instructions that ever loaded validmetadata. For subsequent runs, we considered these staticmemory operations as having been marked by the compileras load/stores of pointers.2 The right bar of each benchmarkin Figure 5 shows that this approach reduces the numberof memory accesses classified as pointer operations to just18% on average.

1 One potentially problematic case is the manipulation of pointersusing byte-by-byte copies (e.g. memcpy()). We found that com-pilers for x86 typically use word-granularity operations. In othercases, we have modified the standard library as needed.2 Although we see this approximation as primarily an experimentalaide, such an approach might actually be useful for instrumentinglibraries or other code that cannot be recompiled.

6

0%

20%

40%

60%

80%

100% %

of

mem

ory

acc

esse

s

Conservative pointer identification

ISA-assisted pointer identification

lbm

com

pgz

ipm

ilc

bzip

2

amm

p gosje

ng

equa

keh2

64ijp

eg

gobm

k art

twol

f

hmm

er vpr

mcf

mes

agc

cpe

rlav

g

Figure 5. Percentage of memory accesses metadata for conservative and ISA-assisted identification.

6. Decoupled Register MetadataAs described thus far, Watchdog explicitly copies metadataalong with each arithmetic operation in registers. This sec-tion briefly discusses and discards the straightforward ap-proach of widening each register with additional metadata.Although such a design might be appropriate for an in-order processor core, our Watchdog implementation targetsout-of-order dynamically scheduled cores. Thus, this sec-tion: (1) describes a decoupled metadata implementation ofWatchdog in which the data and metadata are mapped to dif-ferent physical registers within the core, (2) discusses whatµops would be inserted to maintain the decoupled registermetadata, and (3) shows how metadata propagation over-heads can be reduced via previously-proposed copy elimi-nation modifications to the register renaming logic.

6.1 Strawman: Monolithic Register Data/MetadataAs presented in Section 3, Watchdog views each registeras being widened with a sidecar to contain identifier meta-data. Although that design is conceptually straightforward—especially for an in-order core—it suffers from inefficien-cies. First, every register write (and most register reads) mustaccess the sidecar metadata, increasing the number of bitsread and written to the register file; although not neces-sarily a performance problem, this could be a energy con-cern. Second, a more subtle issue is that treating registerdata/metadata as monolithic causes operations that write justthe data or metadata to become partial register writes. Thesepartial register accesses introduce unnecessary dependenciesbetween µops, for example, the load µop and the metadataload µop. These serializations can have several detrimentaleffects, including (1) increasing the load-to-use penalty ofpointer loads, (2) stalling subsequent instructions if either ofthe loads miss in the cache, and (3) limiting the memory-level parallelism by serializing the load and the metadataload. In our initial experiments with various implementa-tions of monolithic registers, we found the performance im-pact of such serializations to be significant.

6.2 Decoupled Register Data/MetadataTo address these performance issues, Watchdog decouplesthe register metadata by maintaining the data and metadatain separate physical registers. Each architectural register is

mapped to two physical registers: one for data and one forthe metadata. With this change, individual µops generallyoperate on either the data or the metadata, removing the seri-alization caused by partial register writes of monolithic reg-isters. Once decoupled, the metadata propagation and check-ing is almost entirely removed from the critical path of theprogram’s dataflow graph.

With decoupled metadata, there are multiple cases forwhich register metadata must be propagated or updated.First, instructions such as adding an immediate value to aregister simply copy the metadata from the input register tothe output register. Second, some instructions never gener-ate valid pointers (e.g., the output of a sub-word operationor a divide is not a valid pointer), thus such instructions al-ways set the metadata of the output register to be invalid.Third, either of the registers might be a pointer, so for suchinstructions Watchdog inserts a select µop, which selectsthe metadata from whichever register has valid metadata.3

Watchdog performs metadata propagation by changingthe register renaming to reduce the number of extra µops in-serted. In only one of these three cases described above doesWatchdog actually insert µops; in the other cases (copyingthe metadata or setting it to invalid), Watchdog uses previ-ously proposed modifications to register renaming logic [18,30] to handle these operations completely in the register re-name stage. Watchdog extends the maptable to maintain twomappings for each logical register: the regular mapping anda metadata mapping. Instructions that unambiguously copythe metadata (such as “add immediate”, which has a singleregister input) update the metadata mapping of the destina-tion register in the maptable with the metadata mapping en-try of the input register. This implementation eliminates theregister copies by physical register sharing, as there is a sin-gle copy of the metadata in a physical register [30]. To en-sure that this physical register is not freed until all the map-pings in the maptable are overwritten, these physical regis-ters need to be reference counted. We adopt previously pro-posed techniques to efficiently implement reference countedphysical registers [33]. Figure 6 shows how decoupled meta-data avoids copy µops while avoiding the introduction of un-necessary dependencies between µops.

3 If the ISA was further extended to allow the compiler to annotatesuch instructions, these select µops could also be eliminated.

7

A:B: C: ld r1 <- memory[r2]D: add r3<- r1, 4E: F: G: st memory[r2] <- r3

Program with Watchdog uops

Map-Table

ld p4 <- memory[p2] add p5<- p4, 4

st memory[p2] <- p5

r1:(p1, -),r2:(p2,p6),r3:(p3, -)r1:(p1,p7),r2:(p2,p6),r3:(p3, -)r1:(p4,p7),r2:(p2,p6),r3:(p3, -)r1:(p4,p7),r2:(p2,p6),r3:(p5,p7)r1:(p4,p7),r2:(p2,p6),r3:(p5,p7)r1:(p4,p7),r2:(p2,p6),r3:(p5,p7)r1:(p4,p7),r2:(p2,p6),r3:(p5,p7)

Renamed Instructions

check r2.id ld r1.id <- shadow[r2.val]

check r2.id st shadow[r2.val] <- r3.id

check p6 ld p7 <- shadow[p2]

check p6 st shadow[p2] <- p7

Figure 6. Example illustrating register renaming with Watchdog µops and extensions to the map table. Watchdog insertedµops are shaded. The map table is represented by a tuple for each register. r:(a,b) means logical register r maps to physicalregister a according to the regular map table mapping and the logical register r maps to a 128-bit physical register b accordingto the Watchdog mapping. Watchdog introduced load and store µops access the shadow memory (shadow) for accessing themetadata. The watchdog mapping of − indicates the invalid mapping (the register currently contains a non-pointer value).

7. Implementation ConsiderationsCustom memory allocation. For programs that use custommemory allocators (e.g., by requesting a region of memorywhich it then partitions), by default Watchdog will check theallocation status of the entire region of memory. However, ifthe the programmer instruments the custom memory alloca-tor, Watchdog will then be able to perform exact checkingfor these allocators.

Global variables and initialization. Global variables arenever deallocated, so all pointers to globals (data segment)are given the same single global identifier, which ensuresthat the validity check always passes. One way to obtainthe address of a global variable is by PC-relative addressingmodes. Watchdog handles this case by associating the globalidentifier with any address generated by PC-relative address-ing modes. A related issue is initializing the metadata spacefor the global variables. C programs are allowed to initial-ize non-null pointers in the global space by initializing suchpointers to point to other objects in the global space. To prop-erly set the metadata to support initialized global pointers,Watchdog also initializes the entire metadata shadow spacefor the global data segment with the global identifier.

Multithreading. Although we do not evaluate Watchdogin the context of multithreaded programs, Watchdog wouldideally seamlessly support multithreaded workloads (an is-sue not typically addressed by existing software-only ap-proaches). As with software-only approaches, if operationson pointers occur non-atomically, interleaved execution anddata races (either unintentional races or intentional racesused in lock-free concurrent data structures) can cause er-rors. Watchdog will work seamlessly with multithreadedworkloads if its implementation ensures: (1) the runtime hasa thread-safe way to allocate unique identifiers (easily han-dled with a thread-local variable and partitioning the spaceof identifiers), (2) pointer load and store’s data and metadataaccesses execute atomically, and (3) the check and mem-ory operation occur atomically. Such atomicity could be pro-vided by supporting a two-location atomic update operationin hardware. Alternatively, if we consider only lock-based

data-race free programs, #2 above is not necessary and re-quirement #3 above can be relaxed if the check is moved toafter the memory access and the system can tolerate a singleinvalid access before the exception is raised.

8. Integrating Bounds CheckingAlthough the focus of this paper is on use-after-free vio-lations, Watchdog’s overall approach and implementationwas explicitly designed to mesh well with pointer-basedbounds checking [3, 9, 15, 22, 24, 29, 35], which trackbase and bound metadata with pointers for precise byte-granularity bounds checking of all memory accesses. Themetadata propagation and checking machinery proposed inthis paper can be extended to provide efficient bounds check-ing — and thus hardware-enforced full memory safety.

To implement a fully hardware-enforced pointer-basedmemory bounds checker [9], we extend the Watchdog hard-ware in three ways. First, we extend the in-memory and reg-ister metadata with 64-bit base and 64-bit bound metadata(for a total of 256 bits of metadata per pointer). Second,we correspondingly widened the memory access µops in-jected for accessing in-memory metadata. Third, we extendchecking to perform a range check based on the metadataby either: (1) injecting a new additional bounds check µopfor each memory operation or (2) by extending the currentcheck µop to perform both checks in parallel with one µop.As the bounds check consists of just two inequality com-parisons (and requires no additional memory accesses), ei-ther implementation is likely feasible. We evaluate both al-ternatives. Note that Watchdog’s mechanisms such as µopinjection for metadata propagation/checking, pointer identi-fication, decoupled side-car registers, and copy eliminationusing register renaming techniques all apply directly, whichhighlights that the utility of such techniques extends beyondjust use-after-free checking.

To enforce bounds precisely, the hardware relies on byte-granularity bounds information provided by the compilerand/or runtime whenever a pointer is created [9]. For heapallocated objects, the malloc runtime library can conveysuch bounds information. For pointers to a stack-allocated

8

Clock 3.2 GHzFr

ont-e

nd Bpred 3-table PPM: 256x2, 128x4, 128x4,8-bit tags,2-bit counters

Fetch 16 bytes/cycle. 3 cycle latencyRename Max 6µops per cycle. 2 cycle latencyDispatch Max 6µops per cycle. 1 cycle latency

Win

dow

/Exe

c Registers (160 int + 144 floating point), 2 cycleROB/IQ 168-entry ROB, 54-entry IQIssue 6-wide. Speculative wakeup.Int FUs 6 ALU. 1 branch. 2 ld. 1 st. 2 mul/divFP FUs 2 ALU/convert. 1 mul. 1 mul/div/sqrt.LQ size 64-entry LQSQ size 36-entry SQ

Mem

ory

Hie

rarc

hy

L1 I$ 32KB. 4-way, 64B blocks. 3 cyclesPrefetcher 2-streams, 4 blocks eachL1 D$ 32KB, 8-way, 64B blocks, 3 cyclesPrefetcher 4-streams, 4 blocks eachL1 ↔ L2 bus 32-bytes/cycle. 1 cycle.Private L2$ 256KB, 8-way, 64B blocks, 10 cycles.Prefetcher 8 streams. 16 blocks.L2 ↔ L3 bus 8-stop bi-directional ring.

8-bytes/cycle/hop. 2.0GHz clockShared L3$ 16MB. 16-way, 64B blocks, 25 cyclesMem. Bus 800MHz. DDR. 8-bytes wide.

Dual channel. 16ns latencyLock Location $ 4KB, 8-way, 64B blocks

Table 2. Simulated processor configurations

0%

20%

40%

60%

per

centa

ge

slow

dow

n

Conservative pointer identification

ISA-assisted pointer identification

lbm

com

pgz

ipm

ilc

bzip

2

amm

p gosje

ng

equa

keh2

64ijp

eg

gobm

k art

twol

f

hmm

er vpr

mcf

mes

agc

cpe

rl

Geo

. mea

n

Figure 7. Runtime overhead with conservative and ISA-assisted pointer identification

or global object, precise checking requires the compiler toinsert instructions to convey bounds information at pointercreation points. In the absence of such exact information,the hardware can still perform bounds checking — just lessprecisely — by restricting the bounds of pointers pointing tostack variables and globals to the range of the current stackframe and the global segment, respectively.

9. ExperimentsThis section provides an experimental evaluation of Watch-dog and highlights: (1) its effectiveness in preventing secu-rity exploits, (2) its low performance overheads, and (3) itssynergistic integration with bounds checking to provide fullhardware-enforced memory safety.

9.1 MethodologyWe used an x86-64 simulator that executes the user-levelportions of statically linked 64-bit x86 programs. Thesimulator decodes x86 macro instructions and cracksthem into a RISC-style µops. The out-of-order processorparameters (see Table 2) were selected to be similar toIntel’s Core i7 “Sandy Bridge” processor. We modifiedthe standard DL-malloc memory allocator to use the newinstruction to inform the hardware of memory allocationsand deallocations. We used twenty C SPEC benchmarks,including an enhanced version of the equake benchmark

that uses a proper multidimensional array and thus improvesits baseline performance by 60%. We compiled the bench-marks using the GNU C compiler version 4.4 using standardoptimization flags. We generally used the reference inputs,but used train/test inputs in some cases to ensure reasonablesimulation times. We used 2% periodic sampling witheach sample of 10 million instructions proceeded by a fastforward and a warmup of 480 and 10 million instructionsper period, respectively.

9.2 Efficacy in Preventing Security VulnerabilitiesTo evaluate the effectiveness of Watchdog in preventing use-after-free security exploits, we ran the 291 test cases for use-after-free vulnerabilities (CWE-416 and CWE-562) from theNIST Juliet Test Suite for C/C++ [27], which are modeledafter various use-after-free errors reported in the wild. It suc-cessfully detected and thwarted the attack in all the 291 testcases, and it did so without any false positives.

9.3 Runtime Overheads of Use-after-Free CheckingFigure 7 presents the percentage execution time overhead ofWatchdog over a baseline without any Watchdog instrumen-tation (smaller bars are better as they represent lower run-time overheads). The graphs contains a pair of bars for eachbenchmark. The height of the left and right bars representthe overhead of Watchdog with conservative pointer iden-tification (25% on average) and ISA-assisted pointer iden-

9

0%

20%

40%

60%

80%

per

centa

ge

uop o

ver

hea

d

Other

Pointer Stores

Pointer loads

Checks

lbmco

mpgz

ipm

ilc

bzip

2

amm

p gosje

ng

equa

keh2

64ijp

eg

gobm

k art

twol

f

hmm

er vprm

cf

mes

agc

cpe

rlav

g

Figure 8. µop overhead

tification (15% on average), respectively. More than halfof the benchmarks have runtime overheads less than 5%with ISA-assisted pointer identification. These runtime aresubstantially lower than the 50-133% overhead reported byrelated software-only schemes [23, 35]. More importantly,these overheads are likely low enough for use in productionsystems — critical for preventing security vulnerabilities —rather than just for debugging.

Watchdog performs its functionality by inserting µops,so the total number of µops inserted is instructive in under-standing the sources of execution time overheads. Figure 8presents the µop overhead when employing ISA-assistedpointer identification. The total height of the each bar rep-resents the total µop overhead for the benchmark and eachbar is divided into four segments: (1) checks, (2) pointerloads, (3) pointer stores, and (4) the µops to perform mem-ory allocation/deallocation and identifier propagation in reg-isters. On average, Watchdog executes 44% more µops thanthe baseline. The execution time overhead is lower than theµop overhead because these µops are off the critical path andthus execute in parallel as part of superscalar execution. Thecheck µops account for bulk of the µop overhead (29% onaverage). Pointer metadata load and store µops account for4% and 2% of the extra µops on an average but can be ashigh as 14% and 8%, respectively. The µop overhead due topropagation µops and memory allocation/deallocation oper-ations (on the heap and the stack) account for the remainingµops (9% on average).

One potential source of performance overhead is the ad-ditional cache pressure due to the per-pointer shadowspacemetadata. To isolate this effect, we performed a set of sim-ulations configured to idealize the shadow memory accesses(metadata accesses occupy cache ports but never cache missand to not actually consume space in the data cache). Mak-ing the metadata free of cache effects in this way changedthe runtime overhead by only 4% on average (decrease from15% to 11%), indicating that cache pressure effects are gen-erally not dominant in these benchmarks.

To decrease contention on limited cache ports, the re-sults presented thus far include a 4KB lock location cache(Section 4.2). Figure 9 reports the execution time overheadwithout this cache, in which all check operations use thelimited data load ports. Without the lock location cache,

0%

20%

40%

60%

80%

per

centa

ge

slow

dow

n

118With lock location cache

Without lock location cache

lbmco

mpgz

ipm

ilc

bzip

2

amm

p gosje

ng

equa

keh2

64ijp

eg

gobm

k art

twol

f

hmm

er vprm

cf

mes

agc

cpe

rl

Geo

. mea

n

Figure 9. Performance with lock location cache

0%

50%

100%

150%

200%

per

centa

ge

mem

ory

over

hea

d

Words Pages

lbm

com

pres

sgz

ipm

ilc

bzip

2

amm

p gosje

ng

equa

keh2

64ijp

eg

gobm

k art

twol

f

hmm

er vprm

cf

mes

agc

cpe

rl

Geo

.mea

n

Figure 10. Memory overhead with words and pages

the overhead of Watchdog increases to 24% on average (upfrom 15%). The improvements are especially significant forbenchmarks such as hmmer and h264, as these benchmarksalready have high IPC and frequent memory accesses, bothof which lead to significant contention for the data cacheload ports in the absence of a lock location cache. These re-sults are not particularly sensitive to the exact size of thelock location cache; for a 4KB cache, the miss rate is lessthan 1 miss per 1000 instructions for seventeen of the twentybenchmarks.

Figure 10 presents the memory overheads with ISA-assisted pointer identification over a baseline withoutany instrumentation. The memory overhead is negligiblefor the majority of the benchmarks. However, several ofthe benchmarks approach worst-case overheads of twoshadow pages for each non-shadow page. The memoryoverhead is calculated in two ways: total words of memoryaccessed (left bar) and total 4KB pages of memory accessed(right bar), which reflects on-demand allocation of shadowspace pages by the operating system. On an average, thememory overhead calculated these two ways is 32% and56%, respectively. The difference in these metrics reflectthe impact of fragmentation caused by page-granularityallocation of the shadow space. These memory overheadscompare favorably to the overheads reported for garbagecollection [14] or prior best-effort approaches for mitigatinguse-after-free errors such as heap randomization [20] andobject-per-page approaches [10].

10

0%

20%

40%

60%

80%

per

centa

ge

slow

dow

n Watchdog Watchdog+ bounds check (1 uop) Watchdog + bounds check (2 uop)

lbm

com

pgz

ipm

ilc

bzip

2

amm

p gosje

ng

equa

keh2

64ijp

eg

gobm

k art

twol

f

hmm

er vpr

mcf

mes

agc

cpe

rl

Geo

. mea

n

Figure 11. Runtime overhead with bounds checking with one or two check µops

9.4 Integrating Bounds CheckingBeyond use-after-free checking, Section 8 described howthe proposed hardware can be extended to also performbounds checking, thus enforcing full memory safety.Figure 11 presents the execution time overhead with ISA-assisted pointer identification for Watchdog extended toperform bounds checking by either injecting: (1) a singlecheck µop (which performs both the bound and identifierchecking) or (2) a pair of µops (one µop each for boundsand identifier checking). When implemented as a singleµop, the overall number of µops is unchanged and results inonly a 3% increase in execution time overhead (from 15%to 18%), primarily due to the cache pressure introduced bythe larger in-memory metadata. When the bound check isimplemented as an additional µop injection, the overheadincreases by 6% to a total average overhead over theunmodified baseline of 24% for complete memory safety.

10. Related WorkGarbage collection (GC) is an alternative approach to elim-inating dangling pointers for application domains wheregarbage collection is suitable. When combined with “heapi-fication” [24] of escaping stack objects, garbage collectioncan eliminate all dangling pointers. Further, hardwaresupport to either perform GC completely in hardware [21]or to flexibly accelerate GC have been proposed [16] toreduce the overheads. An alternative approach for enforcingsafe manual memory management (at the cost of somemodifications to the program source code) is to assert thatan object’s reference count is zero at deallocation time [12].

Several other approaches have been used to mitigate use-after-free errors. Probabilistic approaches [19, 28] use a ran-domized memory manager to allocate objects randomly onthe heap and recycle the memory randomly, thereby mak-ing it difficult to exploit the errors. These techniques canleverage virtual address translation mechanism to detect er-rors without inserting any checking code. The open-sourcetools Electric Fence, PageHeap, and DUMA allocate eachobject on a different physical and virtual page. Upon deal-location, the access permission on individual virtual pages

are disabled, increasing memory usage and causing highruntime overheads for allocation-intensive programs [10].These schemes allocate one object per virtual and physi-cal page, resulting in large memory overheads for programswith many small objects. This overhead can be partly miti-gated by placing multiple objects per physical page but map-ping a different virtual page for each object to the sharedpage [10, 20].

11. ConclusionThis paper described Watchdog, a hardware approachfor safe and secure manual memory management vialow-overhead, comprehensive use-after-free checking withdisjoint metadata. To efficiently implement Watchdog,we used ideas from existing software-only approaches,explored ISA support for explicitly identifying pointeroperations, described efficient decoupled register metadatavia register renaming copy elimination, and utilized a locklocation cache to accelerate checks. The resulting overheadis likely low enough to use in production (and not justdebugging). Furthermore, we extended the mechanisms toperform bounds checking. The resulting system ensures fullmemory safety and provides comprehensive protection fromuse-after-free and buffer overflow vulnerabilities for a 24%performance penalty.

References[1] National Vulnerability Database. NIST. http://web.nvd.

nist.gov/.[2] P. Akritidis, M. Costa, M. Castro, and S. Hand. Baggy Bounds

Checking: An Efficient and Backwards-compatible Defenseagainst Out-of-Bounds Errors. In Proceedings of the 18thUSENIX Security Symposium, Aug. 2009.

[3] T. M. Austin, S. E. Breach, and G. S. Sohi. Efficient Detectionof All Pointer and Array Access Errors. In Proceedings ofthe SIGPLAN 1994 Conference on Programming LanguageDesign and Implementation, June 1994.

[4] H.-J. Boehm. Space Efficient Conservative Garbage Collec-tion. In Proceedings of the SIGPLAN 1993 Conference onProgramming Language Design and Implementation, pages197–206, June 1993.

11

[5] S. Chen, M. Kozuch, T. Strigkos, B. Falsafi, P. B. Gibbons,T. C. Mowry, V. Ramachandran, O. Ruwase, M. Ryan, andE. Vlachos. Flexible Hardware Acceleration for Instruction-Grain Program Monitoring. In Proceedings of the 35th AnnualInternational Symposium on Computer Architecture, pages377–388, June 2008.

[6] S. Chen, J. Xu, E. C. Sezer, P. Gauriar, and R. K. Iyer. Non-Control-Data Attacks are Realistic Threats. In Proceedings ofthe 14th conference on USENIX Security Symposium, 2005.

[7] W. Chuang, S. Narayanasamy, and B. Calder. AcceleratingMeta Data Checks for Software Correctness and Security.Journal of Instruction-Level Parallelism, 9, June 2007.

[8] M. L. Corliss, E. C. Lewis, and A. Roth. DISE: A Pro-grammable Macro Engine for Customizing Applications. InProceedings of the 30th Annual International Symposium onComputer Architecture, June 2003.

[9] J. Devietti, C. Blundell, M. M. K. Martin, and S. Zdancewic.Hardbound: Architectural Support for Spatial Safety of the CProgramming Language. In Proceedings of the 13th Interna-tional Conference on Architectural Support for ProgrammingLanguages and Operating Systems, Mar. 2008.

[10] D. Dhurjati and V. Adve. Efficiently Detecting All DanglingPointer Uses in Production Servers. In Proceedings of theInternational Conference on Dependable Systems and Net-works, pages 269–280, June 2006.

[11] F. C. Eigler. Mudflap: Pointer Use Checking for C/C++. InGCC Developer’s Summit, 2003.

[12] D. Gay, R. Ennals, and E. Brewer. Safe Manual Memory Man-agement. In Proceedings of the 2007 International Sympo-sium on Memory Management, Oct. 2007.

[13] S. Ghose, L. Gilgeous, P. Dudnik, A. Aggarwal, and C. Wax-man. Architectural Support for Low Overhead Detection ofMemory Viloations. In Proceedings of the Design, Automa-tion and Test in Europe, 2009.

[14] M. Hertz and E. D. Berger. Quantifying the Performance ofGarbage Collection vs. Explicit Memory Management. 2005.

[15] T. Jim, G. Morrisett, D. Grossman, M. Hicks, J. Cheney, andY. Wang. Cyclone: A Safe Dialect of C. In Proceedings of the2002 USENIX Annual Technical Conference, June 2002.

[16] J. A. Joao, O. Mutlu, and Y. N. Patt. Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collec-tion. In Proceedings of the 36th Annual International Sympo-sium on Computer Architecture, pages 418–428, June 2009.

[17] R. W. M. Jones and P. H. J. Kelly. Backwards-CompatibleBounds Checking for Arrays and Pointers in C Programs. InThird International Workshop on Automated Debugging, Nov.1997.

[18] S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, and A. Yoaz.A Novel Renaming Scheme to Exploit Value Temporal Local-ity through Physical Register Reuse and Unification. In Pro-ceedings of the 31st Annual IEEE/ACM International Sympo-sium on Microarchitecture, Nov. 1998.

[19] M. Kharbutli, X. Jiang, Y. Solihin, G. Venkataramani, andM. Prvulovic. Comprehensively and Efficiently Protecting theHeap. In Proceedings of the 12th International Conference onArchitectural Support for Programming Languages and Op-erating Systems, pages 207–218, Oct. 2006.

[20] V. B. Lvin, G. Novark, E. D. Berger, and B. G. Zorn.Archipelago: Trading Address Space for Reliability and Secu-rity. In Proceedings of the 13th International Conference onArchitectural Support for Programming Languages and Op-erating Systems, pages 115–124, Mar. 2008.

[21] M. Meyer. A Novel Processor Architecture with Tag-FreePointers. In IEEE Micro, 2004.

[22] S. Nagarakatte, J. Zhao, M. M. K. Martin, and S. Zdancewic.SoftBound: Highly Compatible and Complete Spatial Mem-ory Safety for C. In Proceedings of the SIGPLAN 2009 Con-ference on Programming Language Design and Implementa-tion, June 2009.

[23] S. Nagarakatte, J. Zhao, M. M. K. Martin, and S. Zdancewic.CETS: Compiler Enforced Temporal Safety for C. In Pro-ceedings of the 2010 International Symposium on MemoryManagement, June 2010.

[24] G. C. Necula, J. Condit, M. Harren, S. McPeak, andW. Weimer. CCured: Type-Safe Retrofitting of Legacy Soft-ware. ACM Transactions on Programming Languages andSystems, 27(3), May 2005.

[25] N. Nethercote and J. Seward. How to Shadow Every Byteof Memory Used by a Program. In Proceedings of the 4thACM SIGPLAN/SIGOPS International Conference on VirtualExecution Environments, pages 65–74, 2007.

[26] N. Nethercote and J. Seward. Valgrind: A Framework forHeavyweight Dynamic Binary Instrumentation. In Proceed-ings of the SIGPLAN 2007 Conference on Programming Lan-guage Design and Implementation, pages 89–100, June 2007.

[27] NIST Juliet Test Suite for C/C++. NIST, 2010.http://samate.nist.gov/SRD/testCases/suites/Juliet-2010-12.c.cpp.zip.

[28] G. Novark and E. D. Berger. DieHarder: Securing the Heap.In Proceedings of the 17th ACM Conference on Computer andCommunications Security, pages 573–584, 2010.

[29] H. Patil and C. N. Fischer. Low-Cost, Concurrent Checkingof Pointer and Array Accesses in C Programs. Software —Practice & Experience, 27(1):87–110, 1997.

[30] V. Petric, T. Sha, and A. Roth. RENO: A Rename-Based In-struction Optimizer. In Proceedings of the 32nd Annual Inter-national Symposium on Computer Architecture, June 2005.

[31] J. Pincus and B. Baker. Beyond Stack Smashing: Recent Ad-vances in Exploiting Buffer Overruns. IEEE Security & Pri-vacy, 2(4):20–27, 2004.

[32] J. Rafkind, A. Wick, M. Flatt, and J. Regehr. Precise GarbageCollection for C. In Proceedings of the 2009 InternationalSymposium on Memory Management, June 2009.

[33] A. Roth. Physical Register Reference Counting. IEEE TCCAComputer Architecture Letters, 7(1), Jan. 2008.

[34] G. Venkataramani, B. Roemer, M. Prvulovic, and Y. Solihin.MemTracker: Efficient and Programmable Support for Mem-ory Access Monitoring and Debugging. In Proceedings ofthe 13th Symposium on High-Performance Computer Archi-tecture, pages 273–284, Feb. 2007.

[35] W. Xu, D. C. DuVarney, and R. Sekar. An Efficient andBackwards-Compatible Transformation to Ensure MemorySafety of C Programs. In Proceedings of the 12th ACM SIG-SOFT International Symposium on Foundations of SoftwareEngineering (FSE), pages 117–126, 2004.

12