Techniques for Evolution-Aware Runtime Verificationmir.cs.illinois.edu/legunsen/slides/ICST-2019.pdf · Techniques for Evolution-Aware Runtime Verification Owolabi Legunsen, Yi Zhang,

Techniques for Evolution-AwareRuntime Verification

Owolabi Legunsen, Yi Zhang, Milica Hadži-Tanović,

Grigore Roșu, Darko Marinov

ICST 2019

4/26/2019

CCF-1421503, CCF-1421575,CCF-1763788, CNS-1619275,CNS-1646305, CNS-1740916

Runtime Verification (RV)

• RV dynamically checks program executions against formal properties, whose violations can help find bugs• a.k.a. runtime monitoring, runtime checking, monitoring-oriented

programming, typestate checking, etc.

• RV has been around for decades, now has its own conference (RV)

•Many RV tools:

2

JavaMOP: a representative RV tool

Code + TestsJavaMOP

Violations

Properties

3

Example property: Collection_SynchronizedCollection (CSC)

4

…65: im = Collections.synchronizedList(…);66: for (IInvokedMethod iim : im) { … }…

SuiteHTMLReporter

TestOnClassListener

TestNG example: from RV of test executions to bugs

JavaMOP

CSC was violated on… SuiteHTMLReporter.java:66… asynchronized collec�on was accessed in thread−unsafe manner

Violations

…CSC

Manual inspection: multiple threads can access “im”

5

RV during Continuous Integration (CI)?

Developers

Version Control

Co

mm

it

Ch

ange

s

1

2

5

Fetch Changes

6 Release/Deploy

Builds per day:• Facebook: 60K*• Google: 17K• HERE: 100K• Microsoft: 30K• Single open-source

projects: up to 80

Releases per day• Etsy: 50

6

CI Server

?

Pass/Fail

* Android only; Facebook: https://bit.ly/2CAPvN9 ; Google: https://bit.ly/2SYY4rR ;HERE: https://oreil.ly/2T0EyeK ; Microsoft: https://bit.ly/2HgjUpw ; Etsy: https://bit.ly/2IiSOJP ;

• Observation: All prior RV techniques are evolution-unaware (Base RV)

• Base RV would re-incur entire overhead if re-run after each code change

Developers

Version Control

Co

mm

it

Ch

ange

s

1

2

5

6 Release/Deploy

7

CI Server

?

Pass/Fail

New Idea: Focus RV on code changes?

Code changes are typically very small relative to entire code base

Fetch Changes

0.97% of classes changed on average in our experiments

Contribution: Evolution-aware Runtime Verification

• Goal: leverage software evolution to scale RV better during testing

• Intended benefits:1. Reduce accumulated runtime overhead of RV across multiple program versions

2. Show developers only new violations after code changes

• Complementary to techniques that improve RV on single program versions• Faster RV algorithms for single program versions

• Running tests in parallel

• Improve properties to have fewer false alarms

8

How JavaMOP works

9

Code+

Tests

InstrumentationInstrumented Code + Tests

Execution

Monitors

Events

Violations

Properties

CSC

Collections.synchronizedList()Collection+.iterator()

We proposed three evolution-aware RV techniques

1. Regression Property Selection (RPS)• Re-monitors only properties that can be violated in parts of code affected by changes

2. Violation Message Suppression (VMS)• Shows only new violations after code changes

3. Regression Property Prioritization (RPP)• Splits RV into two phases:• critical phase: check properties more likely to find bugs on developer’s critical path • background phase: monitor other properties outside developer’s critical path

10The three techniques can be used together

Evolution-aware RV in JavaMOP

11

Code+

Tests

InstrumentationInstrumented Code + Tests

Execution

Monitors

Events

Violations

Properties

Code+

Tests

1. Regression Property Selection (RPS)

2. Violation Message Suppression (VMS)

3. Regression Property Prioritization (RPP)

Evolution-aware RV – Result Overview

• RPS and RPP significantly reduced accumulated runtime overhead of Base RV

• Average: from 9.4x to 1.8x

• Maximum: from 40.5x to 4.2x

• VMS showed 540x fewer violations than Base RV

• RPS did not miss any new violation after code changes

• In theory can miss, but empirically it did not

• See paper for details on VMS and RPP

12

Base RV during software evolution

13

A

TC TE

Code

Tests P2

P1

CSC

Properties

• Base RV re-monitors all properties after every code change• No knowledge of dependencies in the code, or between code and properties

Old Version: monitor CSC, P1, P2

New Version: re-monitor CSC, P1, P2

B

C

D

E

B

Δ = {B}

Regression Property Selection (RPS) Overview

Selected subset of properties are those that may generate new violations

14

RPSOld version of Code+Tests

All available properties

Subset of all available properties

New version of Code+Tests

Regression Property Selection (RPS) – step 1

15

A

TC

B D

E

TE

C

B

Re-monitors only properties that can be violated in parts of code affected by changes

Inheritance or Use

P2

P1

May Generate events for

CSCStep 1a: Build Class Dependency Graph (CDG) for new version

Step 1b: Map classes to properties for which the classes may generate events

Δ = {B}

Regression Property Selection (RPS) – step 2

16

A

TC

B D

E

TE

C

B


Inheritance or Use

P2

P1


CSCAffected classes: those that generate events that can lead to new violations after code changes

Step 2: Compute affected classes

Class X is affected if

1. X changed or is newly added

2. X transitively depends on a changed class, or

3. Class Y that satisfies (1) or (2) can transitively pass data to X

C

TC

D

A

Δ = {B}

Regression Property Selection (RPS) –Steps 3 & 4

17

A

TC

B D

E

TE

C

B


Inheritance or Use

P2

P1


CSC

C

TC

D

Step 3: Select affected properties – those for which affected classes may generate events

Step 4: Re-monitor affected properties: {CSC, P1}

• P2 is NOT re-monitored in the new version• Affected classes cannot generate P2 events• Saves time to monitor P2; does not show old P2 violations

A

Δ = {B}

Total RPS time must be less than Base RV time

18

Step 2: Compute affected classes

Step 3: Select affected properties

Step 4: Re-monitor only affected properties

Step 1a: Build Class Dependency Graph (CDG) for new version

Step 1b: Map classes to properties for which they may generate eventsAnalysis

Re-monitoring

Base RV (Re-monitor all properties)

Analysis Re-monitoringTime Savings

Total Time for RPS

Static and Fast

4.3% of RPS time

RPS Safety and Precision - Definitions

• Evolution-aware RV is safe if it finds all new violations that Base RV finds

• Evolution-aware RV is precise if it finds only new violations that Base RV finds

• RPS discussed so far is safe but not precise• Safe modulo CDG completeness, test-order dependencies, dynamic language features

19

Results of Safe RPS – ps1

20How can we improve these results?

• 20 versions each of 10 GitHub projects• Average project size: 50 KLOC• Average test running time without RV: 51 seconds

RPS variants that use fewer affected classesGoal: Reduce RV overhead by varying “what” set of affected classes is used to select properties

A

TC

B D

E

TE

C

B

Inheritance or Use

P2

P1


CSC

C

TC

D

A What classes are used to select properties?

ps1

Changed classes (i.e., Δ)

Dependents of Δ

Dependees of Δ

Dependees of Δ’s Dependents

ps2

ps3

Δ = {B}

21

Using fewer affected classes can be (un)safe, e.g., ps2

A

TC

B D

E

TE

C

B

Inheritance or Use

P2

P1


CSC

C

TC

D

Aclass D {

static void foo(boolean b) {if (b) { // P1 events}else { // No P1 events}

}}

class C {void getF() {

D.foo(B.b);}}

class B {- public static boolean b = false;

+ public static boolean b = true; }Δ = {B}

22

ps2 can be safe if C does not pass data to D

RPS variants that instrument fewer classesGoal: Reduce RV overhead by varying “where” selected properties are instrumented

A

TC

B D

E

TE

C

B

Inheritance or Use

P2

P1


CSC

C

TC

D

A Where selected properties are instrumented (i ∈ {1,2,3})

psi

affected(Δ)

affected(Δ)c

third-party libraries

ps��

ps��

ps��

Δ = {B}

23

• have fewer violations• ~36% of RV overhead• excluding them can be safe

RPS Variants – Expected Efficiency/Safety Tradeoff

24

“more efficient than” “less safe than”

2 Strong RPS variants are safe under certain assumptions: �� and ��

10 Weak RPS variants are unsafe; they trade safety for efficiency

RPS Results – Runtime Overhead

25Base RV RPS Variants

26

Base RV RPS Variants

RPS Results – Violations Reported

RPS Results – precision and safety

• VMS is precise – it shows only new violations• RPS is not precise – it shows two orders of magnitude more violations than VMS

• We manually confirmed whether all RPS variants find all violations from VMS

• Surprisingly, all weak RPS variants were safe in our experiments

27

Why weak RPS variants were safe in our experiments

• 75% of event traces observed by monitors involved only one class

• 32 of 33 new violations were due to changes whose effects are in ps3

• Additional scenarios captured by ps1 and ps2 did not lead to new violations

• We may have missed old violations when not tracking ps1 or ps2 scenarios

• 87% of old violations missed by excluding third-party libraries did not involve any event from the code

28

Regression Property Prioritization (RPP)

Combining RPS+RPP reduced RV overhead to 1.8x (from 9.4x) 29

All properties

M

N

M+1

N - 1

V1 V2 V3Criticalphase

Backgroundphase

…

…

Conclusion

• We proposed three evolution-aware RV techniques: RPS, VMS, RPP

• Our techniques reduced Base RV overhead from 9.4× to as low as 1.8×

• Taking evolution into account can significantly reduce Base RV overhead during software evolution

30

Owolabi Legunsen: [email protected] Zhang: [email protected] Hadži-Tanović: [email protected]

Grigore Roșu: [email protected] Marinov: [email protected]

Documents

Techniques for Evolution-Aware Runtime Verificationmir.cs.illinois.edu/legunsen/slides/ICST-2019.pdf · Techniques for Evolution-Aware Runtime Verification Owolabi Legunsen, Yi Zhang,