View
223
Download
1
Embed Size (px)
Citation preview
The general motivation
Q: Why do we want to type low level
languages?
A: We want to reap the benefits of type systems at low levels.
Advantages of type systems
•Capturing program errors at compile-time (well-known)
•Enabling aggressive compiler optimizations (recent)
•Supporting sophisticated module systems (SML)
•Facilitating program verification (NuPRL, Coq, PVS)
•Serving as program documentation
The goal of this work
The goal is to capture memory safety of assembly code through a type system
Memory Safety = Type Safety + Safe Array Access
Array bounds checking problem
Array bounds checking refers to determining whether the value of an expression is within the bounds of an array when the expression is used to index the array.
Byte copy: A version in C
voidbcopy(int src[], int dst[]) { int i;
if (length(src) != length(dst) { printf “bcopy: unequal lengths\n”; exit(1); } for(i=1; i < length(src), i++) dst[ii] = src[i];}
Dynamic array bounds checking
• is required for safe languages such as Java, Modula-3, ML, Pascal
• can be expensive in practice (e.g. numerical computation)
• bounds violation is a rich source of program errors in unsafe languages such as C, C++ (e.g. off-by-one error)
Some experimental results
0102030405060
bco
py
bin
ary
sear
ch
bub
ble
sor
t
mat
rix m
ult
quee
n
quic
k so
rt
han
oi t
ower
w/ checks
w/o checks
Static array bounds checking
• Flow Analysis– no annotations required (fully
automatic)– limited to simple cases and sensitive
to program structures– little or no feedback for detecting
program errors
Static array bounds checking
• Type-based approaches– ML type system is too coarse– full dependent types is too fine– dependent ML provides an intermediate
type system with practical type-checking• it is adequate for array bounds checking
elimination• the programmer must write some type
annotations
What are dependent types?
Dependent types depend on the values of language expressions.
For instance, type : dependent type array : array(x) int : int(x) stack : stack(x)
Byte copy: A version in de Caml
let bcopy src dst = begin for i = 0 to vect_length(src) - 1 do dst..(i) <- src..(i) doneendwithtype {n:nat} int vect(n) -> int vect(n) -> unit
Array bounds checking in mobile code
• It needs to be enforced for safety concerns
• It is difficult to eliminate since the machine which executes the code may not trust the source of the code
• It is time-consuming to be compiled away
Increment: A flow chart
i:int
l:label
sp: pop r1
l:labelsp:
r1 = i add r1, r1, 1
r1 = i + 1
sp:
l:labelpop r2 r1 = i + 1
r2 = l
i+1:intsp:
r2 = l
push r1
jmp r2continue ...
State types
A state type corresponds to code continuation. It records the type information about register file and stack.
For instance,• [r1: int(i), r2: int array(i)]• (‘a)[r1: ‘a, r2: [r1: ‘a]]• (‘a,‘b)[r1: ‘a, r2: ‘b, r3: [r1: ‘a, r2: ‘b]]• {n:nat}[sp: [sp: stack(n)] :: stack(n)]
Increment: A version in DTAL
inc:{i:int}{n:nat} [sp: int(i):: [sp: int(i+1)::stack(n)]::stack(n)]
pop r1 // r1: int(i) add r1, r1, 1 // r1: int(i+1) pop r2 // r2: [sp:int(i+1)::stack(n)] push r1 // sp: int(i+1)::stack(n) jmp r2
Type index objects
• index i,j ::= a | c | i+j | i-j | i*j | i/j
• index prop P ::= false | true | i<j | i<=j | i=j| i>=j| i>j | not P | P1 and P2 | P1 or P2
• index sort gamma ::= int | {a: gamma | P} For instance, nat is a shorthand for {a: int | a >= 0}
Types
• types tau ::= alpha | sigma | int(x) | tau array(x) | stack(x) | prod(tau1,…,taun) | {a:gamma}.tau
• state types sigma ::=[(alpha1,…,alphan){a1:gamma1,…,an:gamman}.state]
• state state ::= (register file,stack)
Note: int is for {a:int}.int(a) nat is for {a:nat}.int(a)
Instructions in DTAL
• instructions ins ::= aop rd, rs, v | bop r, v | jmp v | load rd, rs(v) | store r |
newtuple[tau] r | newarray[tau] r | mov r, v | pop r | push r
• values v ::= () | i | l | r
• instruction sequences I ::= halt | ins; I | l; I
Memory allocation
• Tuple of type prod(tau1,…,taun)
• Array of type tau array(n)
tau1
tau2
taun-1
taun
tautau
tautau
Array types are non-variant
tau1 <= tau2 does not implies tau1 array(n) <= tau2 array(n)
Here is a counterexample: r1: nat array(2) r2: int array(2)
00
r1 = = r2
Typing conditional jumps
r = 0?
assumption;state
assumption;state
beq r,v; I
r:int(x)
assumption;state
v:[state’]
assumption,x!=0;state I
assumption,x=0;state state’
Byte copy: A flow chart
r4 <- 0
loop
bcopy
r5 <- r1 - r4
r5 > 0? finish
r5 <- r2(r4)r3(r4) <- r5
r4 <- r4 + 1
r1: array size r2: src r3: dst
Byte copy: A version in DTAL
bcopy:{i:nat} [r1: int(i), r2: int array(i), r3: int array(i)] mov r4, 0 // r4 <- 0 jmp loop // start loop
Byte copy: a version in DTAL
loop: {i:nat, k:nat} [r1: int(i), r2: int array(i) r3: int array(i), r4: int(k)] sub r5, r1, r4 // r5 = r1 - r4 blte r5, finish // r5 <= 0 ? load r5, r2(r4) // safe load store r3(r4), r5 // safe store add r4, r4, 1 // r4 <- r4 + 1 jmp loop // loop againfinish:[] halt
Operational semantics of DTAL
We use a standard abstract machine for assigning operational semantics to DTAL programs. The machine consists of three finite maps:
(Heap, Register File, Stack)
Soundness
• The execution of a DTAL program can either– terminate normally, or– run forever, or– stall.
• A well-typed DTAL program can never stall.
Related work
Here is a (partial) list of some closely related work.– Dependent types in practical
programming (Xi & Pfenning)– TALC Compiler (Morrisett et al at Cornell)– Safe C compiler (Necula & Lee)– TIL compiler (the Fox project at CMU)
Current status & Future work
• We have finished the following.– Theoretical development of DTAL– A prototype implementation of a type-
checker for DTAL
• We are working on the following.– Designing a dependent type system
of JVML (de JVML)– Compiling (a subset of Java) into de
JVML