This One Time, at PL Camp

Preview:

DESCRIPTION

This One Time, at PL Camp. Summer School on Language-Based Techniques for Integrating with the External World University of Oregon Eugene, Oregon July 2007. Checking Type Safety of Foreign Function Calls. Jeff Foster University of Maryland - PowerPoint PPT Presentation

Citation preview

This One Time, at PL Camp ...

Summer School on Language-Based Techniques for Integrating with the External World

University of Oregon

Eugene, Oregon

July 2007

Checking Type Safety of Foreign Function Calls

Jeff Foster

University of Maryland

Ensure type safety across languages• OCaml/JNI – C

Multi-lingual type inference system• Representational types

SAFFIRE• Multi-lingual type inference system

Dangers of FFIs

In most FFIs, programmers write “glue code”• Translates data between host and foreign languages

• Typically written in one of the languages

Unfortunately, FFIs are often easy to misuse• Little or no checking done at language boundary

• Mistakes can silently corrupt memory

• One solution: interface generators

Example: “Pattern Matching”

if (Is_long(x)) { if (Int_val(x) == 0) /* B */ ... if (Int_val(x) == 1) /* D */ ...

} else {

if (Tag_val(x) == 0) /* A */ Field(x, 0) = Val_int(0)

if (Tag_val(x) == 1) /* C */ Field(x, 1) = Val_int(0)}

type t = A of int| B| C of int * int| D

Garbage Collection

C FFI functions need to play nice with the GC• Pointers from C to the OCaml heap must be registered

value bar(value list) { CAMLparam1(list); CAMLlocal1(temp); temp = alloc_tuple(2); CAMLreturn(Val_unit);

}

Easy to forget Difficult to find this error with testing

Multi-Lingual Types

Representational Types• Embed OCaml types in C types and vice versa

SAFFIRE

Static Analysis of Foreign Function InteRfacEs

Programming Models for Distributed Computing

Yannis Smaragdakis

University of Oregon

NRMI: Natural programming model for distributed computing.

J-Orchestra: Execute unsuspecting programs over a network, using program rewriting.

Morphing: High-level language facility for safe program transformation.

NRMI

Identify all reachable

4

9 7

1 3

t

alias1

alias2 4

9 7

1 3

tree

Client side Server sideNetwork

NRMI

Execute remote procedure

4

9 7

1 3

t

alias1

alias2 4

0 9

1 8

tree

Client side Server sideNetwork

2

tmp

NRMI

Send back all reachable

4

9 7

1 3

t

alias1

alias2 4

0 9

1 8

tree

Client side Network

2

NRMI

Match reachable maps

4

9 7

1 3

t

alias1

alias2 4

0 9

1 8

tree

Network

2

NRMI

Update original objects

4

0 9

1 8

t

alias1

alias2 4

0 9

1 8

tree

Network

2

NRMI

Adjust links out of original objects

4

0 9

1 8

t

alias1

alias2 4

0 9

1 8

tree

Network

2

NRMI

Adjust links out of new objects

4

0 9

1 8

t

alias1

alias2 4

0 9

1 8

tree

Network

2

NRMI

Garbage collect

4

0 9

1 8

t

alias1

alias2

Network

2

J-Orchestra

Automatic partition system Works as bytecode compiler

• lots of indirection using proxies, interfaces, local and remote objects

Partitioned program equivalent to original

Morphing

Ensure program generators are safe Statically check the generator to determine

the safety of any generated program, under All inputs• ensure that genrated programs compile

Early approach – SafeGen• Using theorem provers

MJ• Using types

Fault Tolerant Computing

David August and David Walker Princeton University

Processors are becoming more susceptible to intermittent faults.• Moore’s Law, radiation• Alter computation or state, resulting in incorrect

program execution. Goal: Build reliable systems from unreliable

components.

Topics

Transient faults and mechanisms designed to protect against them (HW).

The role of languages and compilers may play in creating radiation hardened programs.

New opportunities made possible by languages which embrace potentially incorrect behavior.

Causes

Software/Compiler

Duplicate instructions and check at important locations (store) [SWIFT, EDDI]

λzap

λ calculus with fault tolerance• Intermediate language for compilers• Models single fault• Based on replication

Semantics model type of faults

let x = 2 inlet y = x + x inout y

let x1 = 2 inlet x2 = 2 inlet x3 = 2 inlet y1 = x1 + x1 inlet y2 = x2 + x2 inlet y3 = x3 + x3 inout [y1,y2,y3]

Testing

Typing Ad Hoc Data

Kathleen Fisher

AT&T Labs

PADS project*

• Data Description Language (DDL)• Data Description Calculus (DDC)• Automatic inference of PADS descriptions

*http://padsproj.org

PADS

Declarative description of data source:• Physical format information• Semantic constraints

type responseCode = { x : Int | 99 < x < 600}

Pstruct webRecord { Pip ip; " - - ["; Pdate(’:’) date; ":"; Ptime(’]’) time; "]"; httpMeth meth; " "; Puint8 code; " "; Puint8 size; " "; };Parray webLog { webRecord[] };

Email

Raw Data

ASCII log files Binary Tracesstruct { ........ ...... ...........}

Data Description

XML

CSV

Standard formats & schema;

Visual Information

End-user tools

Learning

Problem: Producing useful tools for ad hoc data takes a lot of time.

Solution: A learning system to generate data descriptions and tools automatically.

Format Inference Engine

Chunked Data

FormatRefinement

Tokenization

StructureDiscovery

ScoringFunction

IR to PADSPrinter

PADSDescription

Input File(s)

Multi-Staged Programming

Walid Taha

Rice University

Writing generic program that do not pay a runtime overhead.• Use program generators• Ensure syntactic well-formed, well-typed

MetaOCaml

The Abstract View

P2P1

I1P

Batch

I2

I2

I2

MetaOCaml

Brackets (.< >.)• delay execution of an expression

Escape (.~ )• Combine smaller delayed values to construct

larger ones

Run (.! )• Compile and execute the dynamically generated

code

Power Example

let rec power (n , x) = match n with 0 → 1 | n → x * (power (n-1, x));;

let power2 (x) = power (2, x);;let power2 = fun x → power (2, x);;

let power2 (x) = 1*x*x;

let rec power (n, x) = match n with 0 → .<1>. | n → .<.~x * ~(power (n-1, x))>.;;let power2 = .! .<fun x → .~(power (2, .<x>.))>.;;

Scalable Defect Detection

Manuvir Das, Daniel Wang,

Zhe Yang, Microsoft Research

Program analysis at Microsoft scale• scalability, accuracy

Combination of weak global analysis and slow local one (for some regions of code)

Programmers are requires to add interface annotations• some automatic inference is available

Web and Database Application Security

Zhendong Su

University of California-Davis

Static analyses for enforcing correctness of dynamically generated database queries.

Runtime checking mechanisms for detecting SQL injection attacks;

Static analyses for detecting SQL injection and cross-site scripting vulnerabilities.

XML and Web Application Programming

Anders Møller

University of Aarhus

Formal models of XML schemas• Expressiveness of DTD, XML Schema, Relax NG

Type checking XML transformation languages• “Assuming that X is valid according to Sin is T(x)

valid according to Sout?”

Web application frameworks• Java Servlets and JSP, JWIG, GWT

Types for Safe C-Level Programming

Dan Grossman

University of Washington

Cyclone, a safe dialect for C• Designed to prevent safety violations (buffer

overflow, memory management, …)

Mostly underlying theory• Types, expression, memory regions

Analyzing and Debugging Software

Understanding Multilingual Software [Foster]• Parlez vous OCaml?

Statistical Debugging [Liblit]• you are my beta tester, and there’s lots of you

Scalable Defect Detection [Das, Wang, Yang]• Microsoft programs have no bugs

Programming Models

Types for Safe C-Level Programming [Grossman]• C without the ick factor

Staged Programming [Taha]• Programs that produce programs that produce programs...

Prog. Modles for Dist. Comp. [Smaragdakis]• We’ve secretly replaced your centralized program with a

distributed application. can you tell the difference?

The Web

Web and Database Application Security [Su]• How not to be pwn3d by 1337 haxxors

XML and Web Application Programming [Møller]• X is worth 8 points in scrabble...let’s use it a lot

Other Really Important Stuff

Fault Tolerant Computing [August, Walker]• Help, I’ve been hit by a cosmic ray!

Typing Ad Hoc Data [Fisher]• Data, data, everywhere, but what does it mean?

Statistical Debugging

Ben Liblit

University Of Wisconsin-Madison

Statistical Debugging & Cooperative Bug Isolation• Observe deployed software in the

hands of real end users• Build statistical models of success &

failure• Guide programmers to the root causes

of bugs• Make software suck less

What’s This All About?

Motivation

“There are no significantbugs in our released software

that any significant numberof users want fixed.”

Bill Gates, quoted in FOCUS Magazine

Software Releases in the Real World

[Disclaimer: this may be a caricature.]

Software Releases in the Real World

1. Coders & testers in tight feedback loop• Detailed monitoring, high repeatability• Testing approximates reality

2. Testers & management declare “Ship it!”• Perfection is not an option• Developers don’t decide when to ship

Software Releases in the Real World

3. Everyone goes on vacation• Congratulate yourselves on a job well done!• What could possibly go wrong?

4. Upon return, hide from tech support• Much can go wrong, and you know it• Users define reality, and it’s not pretty

– Where “not pretty” means “badly approximated by testing”

Testing as Approximation of Reality

Microsoft’s Watson error reporting system• Crash report from 500,000 separate programs• x% of software causes 50% of bugs• Care to guess what x is?

1% of software errors causes 50% of user crashes Small mismatch ➙ big problems (sometime) Big mismatch ➙ small problem? (sometime!)

• Perfection is not an economically viable option

Real Engineers Measure Things;Are Software Engineers Real Engineers?

“The major difference between a thing that might go wrong and a thing that

cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong, it usually turns out to be

impossible to get at or repair.”

Instrumentation Framework

Douglas Adams, Mostly Harmless

Bug Isolation Architecture

ProgramSource

Compiler

ShippingApplicatio

nSampler

Predicates

Counts& /

StatisticalDebugging

Top bugs withlikely causes

Each behavior is expressed as a predicate P on program state at a particular program point.

Count how often “P observed true” and“P observed” using sparse but fair random samples of complete behavior.

Model of Behavior

ProgramSource

Compiler

ShippingApplicatio

nSampler

Predicates

Counts& /

StatisticalDebugging

Top bugs withlikely causes

Predicate Injection:Guessing What’s Interesting

Branch Predicates Are Interesting

if (p)…

else…

if (p)// p was true (nonzero)

else

// p was false (zero)

Syntax yields instrumentation site Site yields predicates on program behavior Exactly one predicate true per visit to site

Branch Predicate Counts

Returned Values Are Interesting

n = fprintf(…);

Did you know that fprintf() returns a value? Do you know what the return value means? Do you remember to check it?

n = fprintf(…);

// return value < 0 ?// return value == 0 ?// return value > 0 ?

Syntax yields instrumentation site Site yields predicates on program behavior Exactly one predicate true per visit to site

Returned Value Predicate Counts

Pair Relationships Are Interesting

int i, j, k;…i = …;

Pair Relationship Predicate Counts

int i, j, k;…i = …;

// compare new value of i with…// other vars: j, k, …// old value of i// “important” constants

Many Other Behaviors of Interest

Assert statements• Perhaps automatically introduced, e.g. by CCured

Unusual floating point values• Did you know there are nine kinds?

Coverage of modules, functions, basic blocks, …

Reference counts: negative, zero, positive, invalid

Kinds of pointer: stack, heap, null, …

Temporal relationships: x before/after y

More ideas? Toss them all into the mix!

Observation stream observation count• How often is each predicate observed true?• Removes time dimension, for good or ill

Bump exactly one counter per observation• Infer additional predicates (e.g. ≤, ≠, ≥) offline

Feedback report is:

1. Vector of predicate counters

2. Success/failure outcome label

Still quite a lot to measure• What about performance?

Summarization and Reporting

ProgramSource

Compiler

ShippingApplicatio

nSampler

Predicates

Counts& /

StatisticalDebugging

Top bugs withlikely causes

Fair Sampling Transformation

Sampling the Bernoulli Way

Decide to examine or ignore each site…• Randomly• Independently• Dynamically

Cannot be periodic: unfair temporal aliasing Cannot toss coin at each site: too slow

Amortized Coin Tossing

Randomized global countdown• Small countdown upcoming sample

Selected from geometric distribution• Inter-arrival time for biased coin toss• How many tails before next head?• Mean sampling rate is tunable parameter

Geometric Distribution

D = mean of distribution= expected sample density

1)1log(

))1,0(log(1

D

randcountdown

Weighing Acyclic Regions

Break CFG into acyclic regions

Each region has:• Finite number of paths

• Finite max number of instrumentation sites

Compute max weight in bottom-up pass

1

2 1

1

1

2

3

4

Weighing Acyclic Regions

Clone acyclic regions• “Fast” variant

• “Slow” variant

Choose at run time

Retain decrements on fast path for now• Stay tuned…

>4?

Path Balancing Optimization

Decrements on fast path are a bummer• Goal: batch them up

• But some paths are shorter than others

Idea: add extra “ghost” instrumentation sites• Pad out shorter paths

• All paths now equal

1

2 1

1

1

2

3

4

Path Balancing Optimization

Fast path is faster• One bulk counter

decrement on entry

• Instrumentation sites have no code at all

Slow path is slower• More decrements

Consume more randomness

1

2 1

1

1

2

3

4

Optimizations

Identify and ignore “weightless” functions / cycles

Cache global countdown in local variable

Avoid cloning

Static branch prediction at region heads

Partition sites among several binaries

Many additional possibilities…

What Does This Give Us?

Absolutely certain of what we do see• Subset of dynamic behavior• Success/failure label for entire run

Uncertain of what we don’t see

Given enough runs, samples ≈ reality• Common events seen most often• Rare events seen at proportionate rate

Playing the Numbers Game

ProgramSource

Compiler

ShippingApplicatio

nSampler

Predicates

Counts& /

StatisticalDebugging

Top bugs withlikely causes

Isolating a Deterministic Bug

Hunt for crashing bug in ccrypt-1.2

Sample function return values• Triple of counters per call site: < 0, == 0, > 0

Use process of elimination• Look for predicates true on some bad runs,

but never true on any good run

Elimination Strategies

Universal Falsehood• Disregard P if |P| = 0 for all runs• Likely a predicate that can never be true

Lack of failing coverage• All predicates for S is |S|=0 for all failed runs• Site not reached in failing executions

Lack of failing example• |P|=0 for all failed executions• Need not be true for a failure to occur

Successful counterexample• |P|>0 on at least one successful run• Can be true without causing failure

Winnowing Down the Culprits

1710 counters• 3 × 570 call sites

1569 zero on all runs• 141 remain

139 nonzero on at least one successful run

Not much left!• file_exists() > 0• xreadline() == 0

0 500 1000 1500 2000 2500 30000

20

40

60

80

100

120

140

Number of successful trials used

Nu

mb

er

of "

go

od

" fe

atu

res

left

Multiple, Non-Deterministic Bugs

Strict process of elimination won’t work• Can’t assume program will crash when it should

• No single common characteristic of all failures

Look for general correlation, not perfect prediction

Warning! Statistics ahead!

Ranked Predicate Selection

Consider each predicate P one at a time• Include inferred predicates (e.g. ≤, ≠, ≥)

How likely is failure when P is true?• (technically, when P is observed to be true)

Multiple bugs yield multiple bad predicates

Some Definitions

)()()(

)(

0 with runs successful # )(

0 with runsfailing # )(

PFPSPF

PBad

PPS

PPF

Are We Done? Not Exactly!

Bad(f = NULL)= 1.0Bad(f = NULL)= 1.0

Are We Done? Not Exactly!

Predicate (x = 0) is innocent bystander• Program is already doomed

Bad(f = NULL)= 1.0Bad(f = NULL)= 1.0

Bad(x = 0) = 1.0Bad(x = 0) = 1.0

Crash Probability

Identify unlucky sites on the doomed path

Background risk of failure for reaching this site, regardless of predicate truth/falsehood

)()()(

)(PPFPPS

PPFPContext

Isolate the Predictive Value of P

Does P being true increase the chance of failure over the background rate?

Formal correspondence to likelihood ratio testing

)()()( PContextPBadPIncrease

Increase Isolates the Predictor

Increase(f = NULL)= 1.0

Increase(f = NULL)= 1.0

Increase(x = 0) = 0.0Increase(x = 0) = 0.0

It Works!

…for programs with just one bug.

Need to deal with multiple bugs• How many? Nobody knows!

Redundant predictors remain a major problem

Goal: isolate a single “best” predictor for each bug, with no prior knowledge

of the number of bugs.

Multiple Bugs: Some Issues

A bug may have many redundant predictors• Only need one, provided it is a good one

Bugs occur on vastly different scales• Predictors for common bugs may dominate, hiding

predictors of less common problems

Bad Idea #1: Rank by Increase(P)

High Increase but very few failing runs These are all sub-bug predictors

• Each covers one special case of a larger bug Redundancy is clearly a problem

Bad Idea #2: Rank by F(P)

Many failing runs but low Increase Tend to be super-bug predictors

• Each covers several bugs, plus lots of junk

A Helpful Analogy

In the language of information retrieval• Increase(P) has high precision, low recall• F(P) has high recall, low precision

Standard solution:• Take the harmonic mean of both• Rewards high scores in both dimensions

Rank by Harmonic Mean

Definite improvement• Large increase, many failures, few or no

successes But redundancy is still a problem

Redundancy Elimination

One predictor for a bug is interesting• Additional predictors are a distraction• Want to explain each failure once

Similar to minimum set-cover problem• Cover all failed runs with subset of predicates• Greedy selection using harmonic ranking

Simulated Iterative Bug Fixing

1. Rank all predicates under consideration

2. Select the top-ranked predicate P

3. Add P to bug predictor list

4. Discard P and all runs where P was true• Simulates fixing the bug predicted by P• Reduces rank of similar predicates

5. Repeat until out of failures or predicates

Not Covered Today

Visualization of Bug Predictors

• Simple visualization may help reveal trends

Increase(P)

S(P)

error bound

log(F(P) + S(P))

Context(P)

Not Covered Today

Reconstruction of failing paths.• Bug predictor is often the smoking gun, but not

always.• Want short, feasible path that exhibits bug.

– “Just because it’s undecidable doesn’t mean we don’t need an answer.”

Recommended