Tackling Lack of Software Specifications...Tackling Lack of Software Specifications A Sustained,...

Preview:

Citation preview

TacklingLackofSoftwareSpecificationsASustained,SustainabilityandProductivityCrisis

HrideshRajanCollaborationwithHoan A.Nguyen,Tien Nguyen,GaryLeavens,SamanthaKhairunnesa, JohnSingleton,HungPhan,RobertDyer,andVasant Honavar

Sustainabilityandproductivitychallenge

• Toproducecriticalsoftwareinfrastructuresoitis:– ofhighestqualityandfreeofdefects,– producedethicallyandwithinbudget,and– maintainable,upgradeable,portable,scalable,secure.

• Pervasivenessofsoftwareinfrastructuresinsuchcriticalareasaspower,bankingandfinance,airtrafficcontrol,telecommunication,transportation,nationaldefense,andhealthcareneedustoaddressthischallenge.

Softwarespecifications*canhelpachievethissustainabilityandproductivitychallenge.

*Softwarespecifications:formal,oftenmachinereadable,descriptionofsoftware’sintendedbehavior,e.g.{Pre}S{Post} behavioralspecifications

Sustainabilityandproductivitychallenge

• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.–Maintenanceofcodecanbecomeeasier– Lowercostofcodeunderstanding&totallifecyclecost– Specification-guidedcodeoptimization– Preventintroducingnewbugsduringmaintenance– Codereuse– Specification-guidedsynthesis–Modularanalysisandverification,scalabletools

Sustainabilityandproductivitychallenge

• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.

– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools

Despitethesebenefitsuseful,non-trivial

specificationsaren’twidelyavailable

Sustainabilityandproductivitychallenge

• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.

– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools

Whyaren’tsoftwarespecificationswidely

available?

Sustainabilityandproductivitychallenge

• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.

– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools

CostEducationTools

Libraries

Sustainabilityandproductivitychallenge

• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.

– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools

CostEducationTools

Libraries

Sustainabilityandproductivitychallenge

• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.

– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools

Unspecifiedlibrariesarerootcause- increasecostofspecification

- makeeducationharder- maketoolsupportdifficult

- makespecifyinglibrariesharder

Sustainabilityandproductivitychallenge

• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.

– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools

HowtoSolveit?Specifykeylibraries- decreasecostofspecification

- makeeducationeasier(examples)- maketoolsupporteasier

- makespecifyinglibrarieseasier

Sustainabilityandproductivitychallenge

• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.

– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools

HowtoSolveit?SpecifykeylibrariesChallenge#1:lowermanualcostof

specifyinglibraries,infermostChallenge#2:inferrich,butpracticalspecifications,allowcodeevolution

KeyIdeas

PreconditionscanbeminedfromguardedconditionsatthecallsitesofthecodeusingtheAPIs

Preconditionsminedfrommultipleprojects inalarge-scalecodecorpuscanbeusedtofilteroutchaff

voidm(…){…if(pred)lib.api();

…}

MiningPreconditionsofAPIsinLarge-scaleCodeCorpus,FSE’14.

Hoan NguyenRobertDyer* Tien N.Nguyen

Client codeofAPIString.substring(int,int) inprojectSeMoA atrevision1929

completePath_.substring(servletPathStart,extraPathStart)

servletPathStart >=0extraPathStart >=0servletPathStart <=completePath_.length()extraPathStart <=completePath_.length()servletPathStart <=extraPathStart

KeyIdeas

PreconditionscanbeminedfromguardedconditionsatthecallsitesofthecodeusingtheAPIsPreconditionsminedfrommultipleprojects inalarge-scale codecorpuscanbeused tofilteroutchaff

completePath_.substring(servletPathStart,extraPathStart)

completePath_.charAt(servletPathStart)==‘/’

completePath_.charAt(extraPathStart)==‘/’

KeyIdeas

PreconditionscanbeminedfromguardedconditionsatthecallsitesofthecodeusingtheAPIsPreconditionsminedfrommultipleprojects inalarge-scale codecorpuscanbeused tofilteroutchaff

ClientmethodM1

Conditions

0<=startstart<=endend<=lengthcontains(‘@’)

BuildCFG

Extractand

Normalize

Infer0<=startstart<=endend<=length

ClientmethodMN

0<startstart<=endend<=lengthends(‘\n’)

ClientmethodM2

...

Preconditions

0=startstart<=endend<=lengthstarts(‘/’)

api(...)

BuildCFG

Extractand

Normalize

BuildCFG

Extractand

Normalize

FilterandRank

CandidatePreconditions

0<=startstart<=endend<=lengthcontains(‘@’)

api(...)

api(...)

KeyIdeas

PreconditionscanbeminedfromguardedconditionsatthecallsitesofthecodeusingtheAPIsPreconditionsminedfrommultipleprojects inalarge-scale codecorpuscanbeused tofilteroutchaff:a.infer, b.filterandrank

Evaluation– Accuracy

DatacollectionSourceForge Apache

Projects 3,413 146

Totalsourcefiles 497,453 132,951

Totalclasses 600,274 173,120

Totalmethods 4,735,151 1,243,911

TotalSLOCs 92,495,410 25,117,837

TotalusedJDKclasses 806(63%) 918(72%)

TotalusedJDKmethods 7,592(63%) 6,109(55%)

Totalmethodcalls 22,308,251 5,544,437

TotalJDKmethodcalls 5,588,487 1,271,210

Almost120millionsSLOCs

Extractedpreconditions frompublished formalspecification forJDKAPIsonJMLwebsite• 797Methods• 1155preconditions

www.jmlspecs.org

GroundTruth

/*@ public normal_behavior@ requires 0 <= beginIndex@ && beginIndex <= endIndex@ && endIndex <= length();@ …

/*@ public behavior@ …@ signals (NoSuchElementException) isEmpty();@*/

AccuracyofPreconditionsMining

17

Precision Recall Time

SourceForge 84% 79% 17h35m

Apache 82% 75% 34m

Both 83% 80% 18h03m

Performance- ~1minute/condition- 5preconditionsare

newlyfoundfortheJDKAPImethodsthathasalreadyhadJMLspecifications

- Effectivefornewspecs

Class Method Suggest Accept

StringBuffer delete(int,int) 3 Y

replace(int,int,String) 2 Y*

setLength(int) 1 Y

subSequence(int,int) 3 Y

substring(int,int) 3 Y

LinkedList add(int,Object) 2 Y

addAll(int,Collection) 3 Y

get(int) 2 Y

listIterator(int) 2 Y

remove(int) 2 Y

set(int,Object) 2 Y

2classes 11methods 25

Accuracybysize

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

100%

Datasize(projects)

Precision Recall Fscore

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

100%

1 2 4 8 16 32 64 Full

Datasize(projects)

Precision Recall Fscore

SourceForge Apache

63% 19%

18%

Correctness

Correct

GoodStartingPoint

Incorrect

33%

48%

13% 6%

Usefulness

StronglyAgree

Agree

Disagree

StronglyDisagree

UsefulnessEvaluationWeb-basedSurveyhttp://boa.cs.iastate.edu/jml

KeyIdeas

AdditionallabelscanbeminedfromimplicitbeliefsatthecallsitesofthecodeusingtheAPIs

Implicitbeliefsminedfrommultipleprojects inalarge-scalecodecorpuscanbeusedtostrengthenexplicitlabels

voidm(…){…Oo=newO()lib.api(o);…

}

ExploitingImplicitBeliefstoResolveSparseUsageProbleminUsage-basedSpecification

Mining,OOPSLA’17.Hoan NguyenS.Khairunnessa Tien N.Nguyen

Problem:Sparselabelsinminedcodecorpus

KeyIdeas

Strongestpostconditioninferenceproducesimplicitlyparallelformulas

Flattening,andrecombiningparallelformulascanleadtomuchsimplerinferredspecifications.

AnAlgorithmandTooltoInferPracticalPostconditions,

Ongoingwork.JohnSingleton GaryT.Leavens

Problem:Usingextantwork,e.g.strongestpostcondition(sp),forpostcondition inferenceproducesimpracticalspecs

sp (IF B THEN S1 ELSE S2) P = (sp S1(P ^B)) _ (sp S2(P ^ ¬B))

SpecificationReduction

Impact: 84%ofspecifications<¼pageinlength

23

Weareovercominglackofsoftwarespecifications,acriticalhurdleforhighassuranceSE,bycombining

programanalysisanddatamining.

boa.cs.iastate.edu

hridesh@iastate.edu

Recommended