40
1 Inferring Specifications A kind of review

Inferring Specifications

  • Upload
    conley

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Inferring Specifications. A kind of review. The Problem. Most programs do not have specifications Those that do often fail to preserve the consistency between specification and implementation Specification are needed for verification, testing and maintenance. Suggested Solution. - PowerPoint PPT Presentation

Citation preview

Page 1: Inferring Specifications

1

Inferring Specifications

A kind of review

Page 2: Inferring Specifications

2

The Problem

Most programs do not have specifications Those that do often fail to preserve the

consistency between specification and implementation

Specification are needed for verification, testing and maintenance

Page 3: Inferring Specifications

3

Suggested Solution

Automatic discovery of specifications

Page 4: Inferring Specifications

4

Our Playground

Purpose Verification, testing, promote understanding

Specification representation Contracts, properties, automaton, …

Inference technique Static, dynamic, combination

Human intervention

Page 5: Inferring Specifications

5

Restrictions and Assumptions

Learning automata from positive traces is undecidable [Gold67]

An executing program is usually “almost” correct

If a miner can identify the common behavior, it can produce a correct specification, even from programs that contain errors

Page 6: Inferring Specifications

6

Perracota: Mining Temporal API Rules From Imperfect Traces

Jinlin Yang, David EvansDepartment of Computer Science, University Of VirginiaDeepali Bhardwai, Thirumalesh Bhat, Manuvir DasCenter for Software Excellence, Microsoft Corp.

ICSE ‘06

Page 7: Inferring Specifications

7

Key Contribution

Addressing the problem of imperfect traces Techniques for incorporating contextual

information into the inference algorithm Heuristics for automatically identifying

interesting properties

Page 8: Inferring Specifications

8

Perracota

A dynamic analysis tool for automatically inferring temporal properties

Takes the program's execution traces as input and outputs a set of temporal properties it likely has

ProgramInstrumented

Program

instru

men

tation Test Suite

ExecutionTraces

PropertyTemplates

InferredProperties

Testin

g

Infe

rence

Page 9: Inferring Specifications

9

Property Templates

Name QRE Valid Example

Invalid Examples

Response S*(PP*SS*)* SPPSS SPPSSP

Alternating (PS)* PSPS PSS, PPS, SPS

MultiEffect (PSS*)* PSS PPS, SPS

MultiCause (PP*S)* PPS PSS, SPS

EffectFirst S*(PS)* SPS PSS, PPS

CauseFirst (PP*SS*)* PPSS SPSS, SPPS

OneCause S*(PSS*)* SPSS PPSS, SPPS

OneEffect S*(PP*S)* SPPS PPSS, SPSS

Page 10: Inferring Specifications

10

Initial approach

Algorithm is developed for inferring two-event properties (scalability)

Complexity O(nL) time, O(n2) space n – number of distinct events L – length of trace

Each cell in the matrix holds the current state of a state machine that represent the alternating pattern between the pair of events

Require perfect traces

Page 11: Inferring Specifications

11

Approximate Inference

Partition trace into sub-traces For example: PSPSPSPSPSPPPP

PS|PS|PS|PS|PS|PPP Compute satisfaction rate of each template

The ratio between partitions satisfying the alternate property and the total number of partitions

Set a satisfaction threshold

Page 12: Inferring Specifications

12

Contextual Properties

lock1.acqlock2.acqlock2.rellock1.rel

neutral

sensitive

No property Inferred

lock1.acq→lock2.acq lock1.acq→lock2.rellock1.acq→lock1.rel lock2.acq→lock2.rellock2.acq→lock1.rel lock2.rel→lock1.rel

slicing

//lock1acqrel

//lock2acqrel

lock1.acq→lock2.acq

Page 13: Inferring Specifications

13

Selecting Interesting Properties Reachablity

Mark a property P→S as probably uninteresting if S is reachable from P in the call graph

For example:

The relationship between C and D is not obvious from inspecting either C or D

X() { … C(); … D(); …}

A() { … B(); …}

Page 14: Inferring Specifications

14

Selecting Interesting Properties Name Similarity

A property is more interesting if it involves similarly named events

For Example:ExAcquireFastMutexUnsafe

ExReleaseFastMutexUnsafe Compute word similarity as )/(2 sp www

Page 15: Inferring Specifications

15

Chaining

Connect related Alternating properties into chains A→B, B→C and A→C implies A→B→C

Provide a way to compose complex state machines out of many small state machines

Identification of complex multi-event properties without suffering a high computational cost

Page 16: Inferring Specifications

16

SMArTIC: Towards Building an Accurate, Robust and Scalable Specification MinerDavid Lo and Siau-Cheng Khoo

Department of Computer Science, National University of Singapore

FSE ‘06

Page 17: Inferring Specifications

17

Hypotheses

Mined specifications will be more accurate when: erroneous behavior is removed before learning they are obtained by merging the specifications

learned from clusters of related traces than when they are obtained from learning the entire traces

Page 18: Inferring Specifications

18

Filterin

g

Merg

ing

Traces

Clu

stering

Learn

ing

Merged AutomatonFiltered

Traces

Clusters ofFilteredTraces

Automatons

Structure

Page 19: Inferring Specifications

19

Filtering

How can you tell what’s wrong if you don’t know what’s right?

Filter out erroneous traces based on common behavior

Common behavior is represented by “statistically significant” temporal rules

Page 20: Inferring Specifications

20

Pre → Post Rules

Look for rules of the form a→bcwhen a occurs b must eventually occur after a, and c must also eventually occur after b

Rules exhibiting high confidence and reasonable support can be considered as “statistical” invariants Support – Number of traces exhibiting the property

pre→post Confidence – the ratio of traces exhibiting the property

pre→post to those exhibiting the property pre

Page 21: Inferring Specifications

21

Clustering

Convert a set of traces into groups of related traces Localize inaccuracies Scalability

Page 22: Inferring Specifications

22

Clustering Algorithm

Variant of the k-medoid algorithm Compute the distance between a pair of data

items (traces) based on a similarity metric k is the number of clusters to create

Algorithm:Repeatedly increase k until a local maximum is reached

Page 23: Inferring Specifications

23

Similarity Metric

Use global sequence alignment algorithm

Problem: doesn’t work well in the presence of loops

Solution: compare the regular expression representation

FTFTALILLAVAVF--TAL-LLA-AV

ABCBCDABCBCBCDABCD

(A(BC)+D)+ABCD

Page 24: Inferring Specifications

24

Learning

Learn PFSAs from clusters of filtered traces PFSA per cluster

A “place holder” In current experiment – sk-strings learner

Page 25: Inferring Specifications

25

Merging

Merge multiple PFSAs into one The merged PFSA accepts exactly the union

of sentences accepted by the multiple PFSAs Ensures probability integrity

Probability for transition in output PFSA

)()(1

δpwδpiA

k

iiA

Page 26: Inferring Specifications

26

From Uncertainty to Belief: Inferring the Specifications Within

Ted Kremenek, Paul Twohey, Andrew Y. Ng, Dawson EnglerComputer Science Dep., Stanford University

Godmar BackComputer Science Dep., Virginia Tech

OSDI ‘06

Page 27: Inferring Specifications

27

Motivating Example

Problem: Inferring ownership roles Ownership idiom: a resource has at any time

exactly one owning pointer Infer annotation

ro – returns ownership co – claims ownership

Is fopen a ro? fread/fclose a co?

FILE* fp = fopen(“myfile.txt”, r);fread(buffer, n, 1000, fp);fclose(fp);

Page 28: Inferring Specifications

28

Basic Ownership Rules

fp = ro(); ¬co(fp); co(fp); fp = ¬ro(); ¬co(fp); ¬co(fp);

¬Owned

Owned

Uninit

Claimed OK

Bug

¬co

¬ro

ro

co

co

end-of-path

anyuse

¬co

Page 29: Inferring Specifications

29

Goal

Provide a framework that:

1. Allows users to easily express every intuition and domain-specific observation they have that is useful for inferring annotations

2. Reduce such knowledge in a sound way to meaningful probabilities (“common currency”)

Page 30: Inferring Specifications

30

Annotation Inference

1. Define the set of possible annotations to infer

2. Model domain-specific knowledge and intuitions in the probabilistic model

3. Compute annotations probabilitis

Page 31: Inferring Specifications

31

Factors – Modeling Beliefs

Relations mapping the possible values of one or more annotations variables to non-negative real numbers

For example: CheckFactor

belief: any random place might have a bug 10% of times; set <bug>= 0.1, <ok>= 0.9

<ok> : if DFA = OK{ <bug> : if DFA = Bugf<check>=

Page 32: Inferring Specifications

32

Factors

Other factors bias toward specifications with ro without co based on naming conventions …

Page 33: Inferring Specifications

33

Annotation Factor Graph

fopen:ret fread:4 fclose:1 fdopen:ret fwrite:4

f<check> f<check>

f<ro> f<co> f<co> f<ro> f<co>

annotationvariables

prior beliefs

behavioral tests

Page 34: Inferring Specifications

34

Results

fopen:ret fread:4 fclose:1 DFA f<check> f<co>(fread:4) P(A)

ro ¬co co + 0.9 0.7 0.483

¬ro ¬co co + 0.9 0.7 0.282

ro ¬co co - 0.1 0.7 0.125

ro co ¬co - 0.1 0.3 0.054

ro co ¬co - 0.1 0.3 0.023

¬ro ¬co co - 0.1 0.7 0.013

¬ro co ¬co - 0.1 0.3 0.013

¬ro co ¬co - 0.1 0.3 0.006

Page 35: Inferring Specifications

35

QUARK: Empirical Assessment of Automaton-based Specification MinersDavid Lo and Siau-Cheng Khoo

Department of Computer Science, National University of Singapore

WCRE ‘06

Page 36: Inferring Specifications

36

QUARK Framework

Assessing the quality of specification miners Measure performance along multiple

dimensions Accuracy – extent of inferred specification being

representative of the actual specification Scalability – ability to infer large specifications Robustness – sensitivity to errors

Page 37: Inferring Specifications

37

QualityAssessment

QUARK Framework

Simulator Model (PFSA)

TraceGenerator

User Defined Miner

Measurements

Page 38: Inferring Specifications

38

Accuracy (Trace Similarity)

Absence of error Metrics:

Recall – correct information that can be recollected by the mined model

Precision – correct information that can be produced by the mined model

Co-emission – probability similarity

Page 39: Inferring Specifications

39

Robustness

Sensitivity to errors “Inject” error nodes and error transition to the

PFSA model

1 2 3 4

start

end

AB C

D

E

E F

H

error

Z Z

Z

Z

Page 40: Inferring Specifications

40

Scalability

Use synthetic models Build a tree from a pre-determined number of

nodes Add loops based on ‘locality of reference’ Assign equal probabilities to transition from the

same node Vary the size of the model (nodes,

transitions)