42
Using Coq to generate and reason about x86 systems code Andrew Kennedy & Nick Benton (MSR Cambridge) Jonas Jensen (ITU Copenhagen)

Using Coq to generate and reason about x86 systems code

Embed Size (px)

DESCRIPTION

Using Coq to generate and reason about x86 systems code. Andrew Kennedy & Nick Benton (MSR Cambridge) Jonas Jensen (ITU Copenhagen). The big picture. Compositional specification and verification of high-level behavioural properties of low-level systems code - PowerPoint PPT Presentation

Citation preview

Page 1: Using Coq to generate and reason about x86 systems code

Using Coq to generate and reason about x86 systems code

Andrew Kennedy & Nick Benton (MSR Cambridge)

Jonas Jensen (ITU Copenhagen)

Page 2: Using Coq to generate and reason about x86 systems code

Compositional specification and verification of high-level behavioural properties of low-level systems code

Previous work of Benton et al employed idealized machine code Simple design Infinite memory; pointers are natural numbers

It’s time to get real(ish): hence, x86

The big picture

Page 3: Using Coq to generate and reason about x86 systems code

Modelling x86: bits, bytes, instructions, execution

Generating x86: assembling & compiling

Reasoning about x86: logic & proofs

Discussion

Overview of talk

Page 4: Using Coq to generate and reason about x86 systems code

Clean slate: trusted base is just hardware and its model in Coq. † No dependencies on legacy code, languages,

compilers, or software architectures Verify everything – including (at some point) loader-

verifier Do everything in Coq, making effective use of

computation, notation, type classes, tactics, etc. No dependencies on external tools Coq as “world’s best macro assembler”

Our approach

† And a small boot loader

Page 5: Using Coq to generate and reason about x86 systems code

Modelling x86

Page 6: Using Coq to generate and reason about x86 systems code

We want to compute correctly and efficiently inside Coq Proper modelling of n-bit words, arithmetic with carry, sign,

overflow, rotates, shifts, padding, the lot, all O(n) Generic over word-length, so index type by n : nat

We also want to reason soundly inside Coq Associativity, commutativity, order properties, etc

Bits, bytes and words

𝔹𝑛 ℤ2𝑛

Compute here:n-tuples of bools

Reason here: 'Z_(2^n) from ssreflect

library,reuse lemmas

Page 7: Using Coq to generate and reason about x86 systems code

Example: definition of addition

Effective use of dependent types

Definition is very algorithmic:

so we can compute!

Performance inside Coq?

On this machine, about 2000 additions

a second

Page 8: Using Coq to generate and reason about x86 systems code

Example: proofs about addition

1. Deal with n=0 case

4. Apply ssreflect “ring” lemma for 'Z_(2^n)

2. Apply injectivity of toZp to work in 'Z_(2^n):forall x y, toZp x = toZp y -> x = y

3. Rewrite using homomorphism lemmas e.g. toZp (addB p1 p2) = (toZp p1 + toZp p2)%R

Page 9: Using Coq to generate and reason about x86 systems code

Register state is just total function

Flags can take on undefined value (see later)

Abstractly, memory is DWORD BYTE Partiality represents whether memory is mapped

and accessible Concretely, for efficiency, a trie-like structure

Machine state

Page 10: Using Coq to generate and reason about x86 systems code

x86 is notoriously large and baroque (instruction set manual alone is 1640 pages long)

Subset only: no legacy 16-bit mode, flat memory model (no segment nonsense), no floating point, no SIMD instructions, no protected-mode instructions, no 64-bit mode (yet)

Actually: not too bad, possible to factor so that Coq datatype is “total” (no junk)

X86 instructions

Page 11: Using Coq to generate and reason about x86 systems code

Addressing modes

e.g. ADD EBX, EDI + [EDX*4] + 12

Page 12: Using Coq to generate and reason about x86 systems code

Manuals don’t reveal much “structure” – such as it is – in instruction format

But it can bediscerned – andutilitised forconcise decodingfunctions

Instruction format

Page 13: Using Coq to generate and reason about x86 systems code

Instruction decoding

Uses monadic syntax,reader reads from memory and

advances pointer

Note: there may be many instruction

formats for the same instruction

Page 14: Using Coq to generate and reason about x86 systems code

Currently, a partial function from State to State. Implemented in monadic style, using “primitive” operations of r/w

register, r/w flag, r/w memory, etc. Factored to re-use common patterns e.g. evalMemSpec, evalSrc

Instruction execution

Example fragment: call

and return

Page 15: Using Coq to generate and reason about x86 systems code

Non-determinism & under-specification

Page 16: Using Coq to generate and reason about x86 systems code

Non-determinism & under-specification

Page 17: Using Coq to generate and reason about x86 systems code

For sequential x86, for the subset we care about, almost completely deterministic

Flags are the main issue. Introduce “undefined” state for flags Instructions that depend on a flag whose value

is undefined (e.g. branch-on-carry) then has unspecified behaviour

An alternative would be to set flags non-deterministically (cf RockSalt)

Representing non-determinism and under-specification

Page 18: Using Coq to generate and reason about x86 systems code

Generating x86: Assembling and Compiling

Page 19: Using Coq to generate and reason about x86 systems code

Directly represent encoding by list of bytes Note: encoding is

position-dependent In future we might

mirror decodingusing a monadic style

Instruction encoding

Page 20: Using Coq to generate and reason about x86 systems code

Targets of jumps and branches are just absolute addresses in the Instr type. To write assembler code we want labels – for this we use a kind of HOAS type:

Jumps and labels

Page 21: Using Coq to generate and reason about x86 systems code

Cute use of notation in Coq: can write assembler code more-or-less using syntax of real assemblers!

But also make use of Coq definitions, and “macros”

Syntax matters

While macro

Label

Label binding

Page 22: Using Coq to generate and reason about x86 systems code

Given an assembler program and an address to locate it, we can produce a sequence of bytes in the usual “two-pass” way:

Assembling

Page 23: Using Coq to generate and reason about x86 systems code

Statement of correctness uses overloaded “points-to” predicate, to be described later

Round-trip theorem

Memory between offset and endpos

contains bytes

Memory between offset and endpos decodes to

prog

Page 24: Using Coq to generate and reason about x86 systems code

Instead of trusting – or modelling – existing languages such as C, we plan to develop little languages inside Coq.

We have experimented with a tiny imperative language and its “compiler”, proved correct in Coq

Little languages

Page 25: Using Coq to generate and reason about x86 systems code

Code demo!

Page 26: Using Coq to generate and reason about x86 systems code

Reasoning about x86:Logic and Proof

Page 27: Using Coq to generate and reason about x86 systems code

Assertion logic: predicate on partial states, usual connectives + separating conjunction

Specification logic over this, incorporates step-indexing and framing, with corresponding later and frame connectives

Safety specification used to give rules for instructions, in CPS style, packaged as Hoare-style triples for non-jumpy instructions

Treatment of labels makes for elegant definition and rules for macros (e.g. while, if)

Big picture

Page 28: Using Coq to generate and reason about x86 systems code

Partiality denotes partial description, as usual for separation logic Not to be confused with use of partiality for

flags (undefined state) and memory (un-mapped or inaccessible)

Partial states

Page 29: Using Coq to generate and reason about x86 systems code

Assertions (= SPred) are predicates on partial states

Assertion logic

We define a separation logic of assertions, with usual connectives. Example rules:

Points-to predicate for memory is overloaded for different “decoders” of memory

Core definition: memory from p to q “decodes” to

value x

x could be a BYTE, a DWORD, a seq BYTE or

even an Instr

Page 30: Using Coq to generate and reason about x86 systems code

Machine code does not “finish” and so standard Hoare triple does not suit; also, code is mixed up with store. So we define safe k P to mean “runs without faulting for k steps from any state satisfying P.”

Safety

Example: tight loop

Example: jmp

Page 31: Using Coq to generate and reason about x86 systems code

It’s painful working directly with safe: we must work explicitly with “step-index” k and “frame” R

Instead, we define a specification logic in which a spec is a set S of pairs such that

In other words, it builds in steps and frames

Specification logic

Page 32: Using Coq to generate and reason about x86 systems code

To hide explicit step indices, we use a later connective and the Löb rule:

Connectives for spec logic

We define a frame connective

It gives us a “frame rule” for specs, and distributes over other connectives

Page 33: Using Coq to generate and reason about x86 systems code

Given our definitions of safety and points-to for instructions, we can mimic Hoare-style triples for basic blocks:

Basic blocks

We can then derive familiar rules such as framing:

This is useful when proving straight-line machine code

Page 34: Using Coq to generate and reason about x86 systems code

Rules for instructions (I)No control flow

Use Hoare-like triple

Page 35: Using Coq to generate and reason about x86 systems code

Rules for instructions (II)Control flow

Explicit CPS-like use of safe

Two possible continuations

Page 36: Using Coq to generate and reason about x86 systems code

We overload “points-to” on assembler programs, so (roughly)

Reasoning with labels

Page 37: Using Coq to generate and reason about x86 systems code

Our representation of scoped labels makes it easy to define macros that make use of labels internally – and derive rules for them.

Macros

Page 38: Using Coq to generate and reason about x86 systems code

Putting it together: A spec for a memory allocator

Page 39: Using Coq to generate and reason about x86 systems code

Trivial implementation of allocator

Page 40: Using Coq to generate and reason about x86 systems code

Very painful to work with assertions and specs using only primitive rules

We have built Coq tactic support for Basic simplification of formulae (AC of *, etc.) Pulling out existential quantifiers automatically

Greatly simplifies proving!

Proof support

Page 41: Using Coq to generate and reason about x86 systems code

Proof demo!

Page 42: Using Coq to generate and reason about x86 systems code

We can generate and prove correct tiny programs written in “Coq” assembler and a small while-language

Binary generated by Coq can be run on “raw metal” (booted off a CD!)

Next steps Model of I/O e.g. screen/keyboard; currently our “observable” is

just “faulting” High-level model of processes Build and verify OS components such as scheduler, allocator,

loaded Eventual aim: process isolation theorem

Status