Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
TacklingLackofSoftwareSpecificationsASustained,SustainabilityandProductivityCrisis
HrideshRajanCollaborationwithHoan A.Nguyen,Tien Nguyen,GaryLeavens,SamanthaKhairunnesa, JohnSingleton,HungPhan,RobertDyer,andVasant Honavar
Sustainabilityandproductivitychallenge
• Toproducecriticalsoftwareinfrastructuresoitis:– ofhighestqualityandfreeofdefects,– producedethicallyandwithinbudget,and– maintainable,upgradeable,portable,scalable,secure.
• Pervasivenessofsoftwareinfrastructuresinsuchcriticalareasaspower,bankingandfinance,airtrafficcontrol,telecommunication,transportation,nationaldefense,andhealthcareneedustoaddressthischallenge.
Softwarespecifications*canhelpachievethissustainabilityandproductivitychallenge.
*Softwarespecifications:formal,oftenmachinereadable,descriptionofsoftware’sintendedbehavior,e.g.{Pre}S{Post} behavioralspecifications
Sustainabilityandproductivitychallenge
• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.–Maintenanceofcodecanbecomeeasier– Lowercostofcodeunderstanding&totallifecyclecost– Specification-guidedcodeoptimization– Preventintroducingnewbugsduringmaintenance– Codereuse– Specification-guidedsynthesis–Modularanalysisandverification,scalabletools
Sustainabilityandproductivitychallenge
• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.
– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools
Despitethesebenefitsuseful,non-trivial
specificationsaren’twidelyavailable
Sustainabilityandproductivitychallenge
• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.
– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools
Whyaren’tsoftwarespecificationswidely
available?
Sustainabilityandproductivitychallenge
• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.
– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools
CostEducationTools
Libraries
Sustainabilityandproductivitychallenge
• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.
– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools
CostEducationTools
Libraries
Sustainabilityandproductivitychallenge
• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.
– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools
Unspecifiedlibrariesarerootcause- increasecostofspecification
- makeeducationharder- maketoolsupportdifficult
- makespecifyinglibrariesharder
Sustainabilityandproductivitychallenge
• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.
– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools
HowtoSolveit?Specifykeylibraries- decreasecostofspecification
- makeeducationeasier(examples)- maketoolsupporteasier
- makespecifyinglibrarieseasier
Sustainabilityandproductivitychallenge
• Ifspecificationsarewidelyavailable,awidevarietyoftechniquesforaddressingthesustainabilityandproductivitycrisiscanbeenabled.– Maintenanceofcodecanbecomeeasier,becauseengineerswillnotneedtospendtimereverseengineeringcode.
– Lowercostofcodeunderstanding,lowertotallifecyclecost.– Optimizationofcodewillbegreatlyfacilitated– Preventintroducingnewbugsduringmaintenance– Codereuse– Synthesisofcode– Modularanalysisandverificationleadingtoscalabletools
HowtoSolveit?SpecifykeylibrariesChallenge#1:lowermanualcostof
specifyinglibraries,infermostChallenge#2:inferrich,butpracticalspecifications,allowcodeevolution
KeyIdeas
PreconditionscanbeminedfromguardedconditionsatthecallsitesofthecodeusingtheAPIs
Preconditionsminedfrommultipleprojects inalarge-scalecodecorpuscanbeusedtofilteroutchaff
voidm(…){…if(pred)lib.api();
…}
MiningPreconditionsofAPIsinLarge-scaleCodeCorpus,FSE’14.
Hoan NguyenRobertDyer* Tien N.Nguyen
Client codeofAPIString.substring(int,int) inprojectSeMoA atrevision1929
completePath_.substring(servletPathStart,extraPathStart)
servletPathStart >=0extraPathStart >=0servletPathStart <=completePath_.length()extraPathStart <=completePath_.length()servletPathStart <=extraPathStart
KeyIdeas
PreconditionscanbeminedfromguardedconditionsatthecallsitesofthecodeusingtheAPIsPreconditionsminedfrommultipleprojects inalarge-scale codecorpuscanbeused tofilteroutchaff
completePath_.substring(servletPathStart,extraPathStart)
completePath_.charAt(servletPathStart)==‘/’
completePath_.charAt(extraPathStart)==‘/’
KeyIdeas
PreconditionscanbeminedfromguardedconditionsatthecallsitesofthecodeusingtheAPIsPreconditionsminedfrommultipleprojects inalarge-scale codecorpuscanbeused tofilteroutchaff
ClientmethodM1
Conditions
0<=startstart<=endend<=lengthcontains(‘@’)
BuildCFG
Extractand
Normalize
Infer0<=startstart<=endend<=length
ClientmethodMN
0<startstart<=endend<=lengthends(‘\n’)
ClientmethodM2
...
Preconditions
0=startstart<=endend<=lengthstarts(‘/’)
api(...)
BuildCFG
Extractand
Normalize
BuildCFG
Extractand
Normalize
FilterandRank
CandidatePreconditions
0<=startstart<=endend<=lengthcontains(‘@’)
api(...)
api(...)
KeyIdeas
PreconditionscanbeminedfromguardedconditionsatthecallsitesofthecodeusingtheAPIsPreconditionsminedfrommultipleprojects inalarge-scale codecorpuscanbeused tofilteroutchaff:a.infer, b.filterandrank
Evaluation– Accuracy
DatacollectionSourceForge Apache
Projects 3,413 146
Totalsourcefiles 497,453 132,951
Totalclasses 600,274 173,120
Totalmethods 4,735,151 1,243,911
TotalSLOCs 92,495,410 25,117,837
TotalusedJDKclasses 806(63%) 918(72%)
TotalusedJDKmethods 7,592(63%) 6,109(55%)
Totalmethodcalls 22,308,251 5,544,437
TotalJDKmethodcalls 5,588,487 1,271,210
Almost120millionsSLOCs
Extractedpreconditions frompublished formalspecification forJDKAPIsonJMLwebsite• 797Methods• 1155preconditions
www.jmlspecs.org
GroundTruth
/*@ public normal_behavior@ requires 0 <= beginIndex@ && beginIndex <= endIndex@ && endIndex <= length();@ …
/*@ public behavior@ …@ signals (NoSuchElementException) isEmpty();@*/
AccuracyofPreconditionsMining
17
Precision Recall Time
SourceForge 84% 79% 17h35m
Apache 82% 75% 34m
Both 83% 80% 18h03m
Performance- ~1minute/condition- 5preconditionsare
newlyfoundfortheJDKAPImethodsthathasalreadyhadJMLspecifications
- Effectivefornewspecs
Class Method Suggest Accept
StringBuffer delete(int,int) 3 Y
replace(int,int,String) 2 Y*
setLength(int) 1 Y
subSequence(int,int) 3 Y
substring(int,int) 3 Y
LinkedList add(int,Object) 2 Y
addAll(int,Collection) 3 Y
get(int) 2 Y
listIterator(int) 2 Y
remove(int) 2 Y
set(int,Object) 2 Y
2classes 11methods 25
Accuracybysize
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
100%
Datasize(projects)
Precision Recall Fscore
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
100%
1 2 4 8 16 32 64 Full
Datasize(projects)
Precision Recall Fscore
SourceForge Apache
63% 19%
18%
Correctness
Correct
GoodStartingPoint
Incorrect
33%
48%
13% 6%
Usefulness
StronglyAgree
Agree
Disagree
StronglyDisagree
UsefulnessEvaluationWeb-basedSurveyhttp://boa.cs.iastate.edu/jml
KeyIdeas
AdditionallabelscanbeminedfromimplicitbeliefsatthecallsitesofthecodeusingtheAPIs
Implicitbeliefsminedfrommultipleprojects inalarge-scalecodecorpuscanbeusedtostrengthenexplicitlabels
voidm(…){…Oo=newO()lib.api(o);…
}
ExploitingImplicitBeliefstoResolveSparseUsageProbleminUsage-basedSpecification
Mining,OOPSLA’17.Hoan NguyenS.Khairunnessa Tien N.Nguyen
Problem:Sparselabelsinminedcodecorpus
KeyIdeas
Strongestpostconditioninferenceproducesimplicitlyparallelformulas
Flattening,andrecombiningparallelformulascanleadtomuchsimplerinferredspecifications.
AnAlgorithmandTooltoInferPracticalPostconditions,
Ongoingwork.JohnSingleton GaryT.Leavens
Problem:Usingextantwork,e.g.strongestpostcondition(sp),forpostcondition inferenceproducesimpracticalspecs
sp (IF B THEN S1 ELSE S2) P = (sp S1(P ^B)) _ (sp S2(P ^ ¬B))
SpecificationReduction
Impact: 84%ofspecifications<¼pageinlength
23
Weareovercominglackofsoftwarespecifications,acriticalhurdleforhighassuranceSE,bycombining
programanalysisanddatamining.
boa.cs.iastate.edu