20
EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY FOR CROSS-LAYER RESILIENCE ANALYSIS Bo Fang, Qining Lu , Karthik Pattabiraman , Matei Ripeanu and Sudhanva Gurumurthi * The University of British Columbia, Canada *Cloud Innovation Lab, IBM, USA 1

EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY FOR CROSS-LAYER RESILIENCE ANALYSIS

Bo Fang☨, Qining Lu☨, Karthik Pattabiraman ☨, Matei Ripeanu ☨and Sudhanva Gurumurthi *

☨ The University of British Columbia, Canada*Cloud Innovation Lab, IBM, USA

1

Page 2: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

What are we facing?

§SoC softerrortrends:overallFITrateperSoC isincreasing[DATE2014,ChandraAMD]

11 DATE 2014 DATE 2014

SoC soft error trends Bitcell SER FIT rate per node

0

100

200

300

400

500

600

700

200 150 100 50 0

SCU Avg/node MCU Avg/node

SoC SER FIT rate per node

1

10

100

1000

200 150 100 50 0

Memory SER Logic SER

Even though per memory bitcell SER sensitivity is decreasing, overall FIT per SoC is increasing

Source: iRoC

2

Page 3: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

Why Software-based Fault Tolerance

§Hardware-based techniques

3

Device/Circuit Level

Architectural Level

Operating System Level

Application Level

Impactful Errors

HardwareFaults

Software-based techniques: more cost-effective

Page 4: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

Mitigating Silent Data Corruption(SDC): Key toError Resilience

4

Normalexecution

Fault

SDC

Crash

Hang

Benign

Error

Incorrectoutput

Page 5: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

ErrorResilienceEstimation:AccuracyvsCost

5

Accuracy

Cost

FI

Highresourceconsumption, low`predictive power

Conservativeestimation of Error

Resilience

AVF/PVF

[HPCA2010,MICRO2003]

Goal

Page 6: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

IdentifyingSDC-causingBits

§ AVF/PVF: IdentifyArchitecturallyCorrectExecution(ACE)Bits[MICRO03,HPCA10]

6

Totalbitsforexecution

ACEbits

e(nhanced)PVF:amethodologythatdistinguishescrash-causingbitsfromACEbits

SDC-causingbits

Crash-causingbits

Page 7: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

PVF Analysis[Sridharan,HPCA10’]

§ ACEBits= ∑ 𝐵𝑖𝑡𝑠𝑖𝑛𝑅𝑖*+,-

§ TotalBits=∑ 𝐵𝑖𝑡𝑠𝑖𝑛𝑅𝑖.+,-

§ PVF= /012+34563782+34

=88.9%

7

R1 = LD R2R4 = ADD R1, R3R5 = ADD R6*4, R7ST R4, R5R8 = LD R2

ADDR1

R2

R1R3

R4

ADDR2

R5

R6

R7

R8LDLD

ADDADD

STADD

ADD

Page 8: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

OurApproach:ePVF§ Sourceofcrashes

§ Segmentation faults(99%ofcrashesareduetosegfaults)

§ Directcrash-causingbits§ Crashmodel

§ Indirectcrash-causingbits§ Propagationmodel

8

ADDR1

R2

R1R3

R4

ADDR2

R5

R6

R7

R8LDLD

ADDADD

STADD

ADD

Source of crashes

Segfaults Others

Page 9: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

Overallmethodology

PVF-IdentifyACEbits

ObtainingProgramTrace

CrashModel

PropagationModel

Identifybitsthatcauseaprogramtomakeaninvalidmemoryaccess

andcrash

Identifybitsonthebackwardsliceofbitsthatdirectlycause

crashes9

Page 10: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

Crashmodel

§ Determiningthebitsthatcauseanout-of-boundmemoryaccess§ Appliedoneverymemoryinstruction

R2∈ [addr_min,addr_max]

01110001010010…

R2

OS

Info

PVF-IdentifyACEbits

ObtainingProgramTrace

CrashModel

PropagationModel

R1 = LD R2R4 = ADD R1, R3R5 = ADD R6*4, R7ST R4, R5R8 = LD R2

R1=LDR2

vma_start vma_end

ESP 10

Page 11: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

Propagationmodel

§ Identifyingallpossiblebitsthatcanaffectthebitsidentifiedbythecrashmodel

Crashmodel min(R5),max(R5)

max(R6)=(max(R5)– R7)/4min(R6)=(min(R5)– R7)/4

max(R7)=max(R5)– R6*4min(R7)=min(R5)– R6*4

11

PVF-IdentifyACEbits

ObtainingProgramTrace

CrashModel

PropagationModel

R1 = LD R2R4 = ADD R1, R3R5 = ADD R6*4, R7ST R4, R5R8 = LD R2

R5=ADDR6*4+R7ST R4, R5

Page 12: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

OverallePVF methodology

PVF-IdentifyACEbits

ObtainingProgramTrace

CrashModel

PropagationModel

ePVF BitsthatpotentiallyleadtoSDCs

12

Page 13: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

Experimentalsetup

§ Scientificbenchmarks§ 8fromRodinia [IISWC09]§ MatrixMultiplication§ LULESH:DOEproxyapp[IPDPS2013]

§ FaultModel§ LLFI[DSN14]

§ 3,000runsperbenchmark

13

Page 14: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

Evaluation

§ RQ1:Accuracyofthemodels§ RQ2:EffectivenessoftheePVF methodology§ RQ3:Performance

14

Totalbitsforexecution

ACEbits

SDC-causingbits

Crash-causingbits

Page 15: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

RQ1:Accuracyofthemodels

§ Recall

§ Precision

50%

60%

70%

80%

90%

100%

Rec

all o

f the

Mod

el

50%

60%

70%

80%

90%

100%

Pre

cisi

on o

f th

e M

odel

Ourmodelsachieveaverage89%recalland92%

precision

15

FI experiments

Crash trials

Pick the flipped bit for a crash

trail

Check that bit for the model

Randomly pick a bit from the

models

Flip the exact bit during the

execution

Check if a crash occurs

50%

60%

70%

80%

90%

100%

Rec

all o

f the

Mod

elFI experiments

Crash trials

Pick the flipped bit for a crash

trail

Check that bit for the model

Page 16: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

RQ1.AccuracyoftheModels

16

Onaverage,90%ofthetimetheePVF methodologyisaccuratetoidentifycrash-causingbits

Totalbitsforexecution

ACEbits

SDC-causingbits

Crash-causingbits

Page 17: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

RQ2:EffectivenessoftheePVF

§ SDCestimate using PVFanalysis,ePVF analysisandFaultInjection

0%

20%

40%

60%

80%

100%PVF value ePVF value SDC rate from FI

ePVF significantlytightenstheupperboundofestimatedSDCs

by61%onaverage

17

Page 18: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

ePVF-informedDuplication

§ RankinstructionsbasedontheirePVF value

§ ePVF valueperinstruction=/01:+34;0<74=;>?74+@A:+34/01:+34

§ HighertheePVF value,HigherchancetoleadtoSDCs§ Duplicationhighly-rankedePVF instructions§ 30%moreSDCcoveragethanhot-pathduplicationforthesameperformanceoverhead

18

Page 19: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

RQ3:Performance

§Modelingtimerangesfrom30s(lavaMD)to~4hours(pathfinder).§ Depending onthesizeoftheDDG,hence thenumberofdynamic instructions

§ Optimization(SamplingandExtrapolation)§ Intuition– scientific applications usuallyhaverepetitive behaviors.

0%

15%

30%

45%predicted ePVF computed ePVF

ExtrapolatedePVF valuesbasedon10%ofthegraph,andshowinglessthan1%differenceonaverage

19

Page 20: EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR …blogs.ubc.ca/karthik/files/2016/06/DSN2016-presentation... · 2016-06-29 · EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY

Conclusion

§ ePVF removes thecrash-causingbitsfromPVFtogetamoreaccurateestimateofSDCrate.§ Acrashmodel thatpredictsdirectcrash-causing bits§ Apropagationmodelthat identifies bitthat leadtodirectcrash-causingbits§ Implementation withLLVMcompiler§ Drive selective protection ofSDC-causing instructions

Email:[email protected]:https://github.com/flyree/enhancedPVF

20