233
January 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann Yuan Yu Systems Research Center 130 Lytton Avenue Palo Alto, California 94301 http://www.research.compaq.com/SRC/

Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Embed Size (px)

Citation preview

Page 1: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

January22,2002

SRCResearchReport 177

The VestaSoftwareConfigurationManagementSystem

Allan HeydonRoy Levin

Timothy MannYuanYu

SystemsResearch Center130LyttonAvenuePaloAlto, California94301

http://www.research.compaq.com/SRC/

Page 2: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Compaq SystemsResearch Center

SRC’scharteris to advancethestateof theart in computersystemsby doingbasicandappliedresearchin supportof ourcompany’sbusinessobjectives.Ourinterestsandprojectsspanscalablesystems(including hardware, networking, distributedsystems,andprogramming-languagetechnology),theInternet(includingtheWeb,e-commerce,andinformationretrieval), andhuman/computerinteraction(includ-ing user-interfacetechnology, computer-basedappliances,andmobilecomputing).SRCwasestablishedin 1984by Digital EquipmentCorporation.

We testthevalueof our ideasby building hardwareandsoftwareprototypesandassessingtheir utility in realistic settings. Interestingsystemsare too complexto be evaluatedsolely in theabstract;practicaluseenablesus to investigatetheirpropertiesin depth. This experienceis useful in the short term in refining ourdesignsandinvaluablein thelong termin advancingour knowledge.Most of themajoradvancesin informationsystemshavecomethroughthisapproach,includingpersonalcomputing,distributedsystems,andtheInternet.

Wealsoperformcomplementarywork of amoremathematicalcharacter. Someofthat lies in establishedfieldsof theoreticalcomputerscience,suchastheanalysisof algorithms,computer-aidedgeometricdesign,securityandcryptography, andformal specificationandverification. Otherwork exploresnew groundmotivatedby problemsthatarisein oursystemsresearch.

Wearestronglycommittedto communicatingour results;exposingandtestingourideasin theresearchanddevelopmentcommunitiesleadsto improvedunderstand-ing. Our researchreportseriessupplementspublicationin professionaljournalsandconferences,while our technicalnoteseriesallows timely disseminationof re-centresearchfindings.Weseekusersfor ourprototypesystemsamongthosewithwhomwe have commoninterests,andwe encouragecollaborationwith universityresearchers.

Page 3: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

The VestaSoftwareConfigurationManagementSystem

Allan Heydon,Roy Levin, TimothyMann,andYuanYu

January22,2002

Page 4: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Allan Heydon, Roy Levin, and Timothy Mann are no longer with Compaq.They canbereachedby electronicmail attheaddressescaheydon@yahoo. co m,[email protected] , [email protected] rg .

TheVestaproject’s Websiteis at http://www.ves ta sy s.o rg / .

c�

CompaqComputer Corporation 2002

Thiswork maynotbecopiedor reproducedin wholeor in partfor any commercialpurpose.Permissionto copy in wholeor in partwithout paymentof feeis grantedfor nonprofiteducationaland researchpurposesprovided that all suchwhole orpartial copiesincludethe following: a noticethat suchcopying is by permissionof the SystemsResearchCenterof CompaqComputerCorporationin Palo Alto,California; an acknowledgmentof the authorsand individual contributors to thework; andall applicableportionsof the copyright notice. Copying, reproducing,or republishingfor any otherpurposeshallrequirea licensewith paymentof feetotheSystemsResearchCenter. All rightsreserved.

Page 5: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Abstract

Vestais a systemfor softwareconfigurationmanagement.It storescollectionsofsourcefiles,keepstrackof whichversionsof whichfilesgotogether, andautomatesthe processof building a completesoftware artifact from its componentpieces.Unlike othersoftwareconfigurationmanagementsystems,Vestawasspecificallydesignedto handlevery large projects—tensof millions of lines of codeandbe-yond. Vesta’s novel approachgives it threeimportantpropertiesnot available inothersystems.First,every build is repeatable, becauseits componentsourcesandbuild toolsarestoredimmutablyandimmortally, andits configurationdescriptioncompletelydescribeswhat componentsandtools areusedandhow they areputtogether. Second,everybuild is incremental, becauseresultsof previousbuildsarecachedandreused.Third, everybuild is consistent, becauseall build dependenciesareautomaticallycaptured,recorded,andchecked, so thata cachedresultfrom aprevious build is reusedonly whendoingso is certainto be correct. In addition,Vesta’s flexible languagefor writing configurationdescriptionsmakes it easytodescribelarge softwareconfigurationsin a modularfashionandto createvariantconfigurationsby customizingbuild parameters.This reportdescribesthe Vestatechnologyin detailanddiscussestheperformanceof our implementation.

iii

Page 6: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

iv

Page 7: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Contents

1 Intr oduction 1

2 RelatedWork 52.1 RCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 CVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Make . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 DSEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.5 ClearCASE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.6 Vesta-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.7 OtherSystems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 SystemOverview 143.1 SystemComponents . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1.1 SourceControlComponents. . . . . . . . . . . . . . . . 153.1.2 Build Components. . . . . . . . . . . . . . . . . . . . . 173.1.3 StorageComponents. . . . . . . . . . . . . . . . . . . . 203.1.4 ModelsandModularity . . . . . . . . . . . . . . . . . . . 21

3.2 Foundations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 DesignTargets . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Repository 254.1 TheUser’s View . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1.1 SourceNamespace. . . . . . . . . . . . . . . . . . . . . 264.1.2 Versioning . . . . . . . . . . . . . . . . . . . . . . . . . 274.1.3 NamingConvention . . . . . . . . . . . . . . . . . . . . 284.1.4 DevelopmentCycleTools . . . . . . . . . . . . . . . . . 304.1.5 MutableFilesandDirectories . . . . . . . . . . . . . . . 354.1.6 MutableAttributes . . . . . . . . . . . . . . . . . . . . . 364.1.7 AccessControl . . . . . . . . . . . . . . . . . . . . . . . 37

v

Page 8: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

4.2 TheEvaluator’s View . . . . . . . . . . . . . . . . . . . . . . . . 414.2.1 DerivedFilesandShortids . . . . . . . . . . . . . . . . . 414.2.2 EvaluatorDirectoriesandVolatileDirectories . . . . . . . 434.2.3 Fingerprints. . . . . . . . . . . . . . . . . . . . . . . . . 44

4.3 TheImplementor’s View . . . . . . . . . . . . . . . . . . . . . . 464.3.1 ShortidsandFiles. . . . . . . . . . . . . . . . . . . . . . 464.3.2 Directories . . . . . . . . . . . . . . . . . . . . . . . . . 474.3.3 Longids . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.3.4 Copy on Write . . . . . . . . . . . . . . . . . . . . . . . 514.3.5 NFSInterface. . . . . . . . . . . . . . . . . . . . . . . . 524.3.6 RPCInterfaces . . . . . . . . . . . . . . . . . . . . . . . 53

4.4 Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.4.1 GlobalNamespace. . . . . . . . . . . . . . . . . . . . . 544.4.2 Mastership . . . . . . . . . . . . . . . . . . . . . . . . . 554.4.3 ReplicationExample . . . . . . . . . . . . . . . . . . . . 564.4.4 Agreement . . . . . . . . . . . . . . . . . . . . . . . . . 574.4.5 Agreement-PreservingPrimitives . . . . . . . . . . . . . 594.4.6 PropagatingAttributes . . . . . . . . . . . . . . . . . . . 624.4.7 ReplicationAccessControl. . . . . . . . . . . . . . . . . 634.4.8 TheReplicator . . . . . . . . . . . . . . . . . . . . . . . 634.4.9 Cross-RepositoryCheckout . . . . . . . . . . . . . . . . 64

5 SystemModeling Language 685.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.2 LanguageHighlights . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2.1 TheEnvironmentParameter . . . . . . . . . . . . . . . . 725.2.2 Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . 725.2.3 Tool Encapsulation. . . . . . . . . . . . . . . . . . . . . 755.2.4 Closures . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.3 ModelOrganization. . . . . . . . . . . . . . . . . . . . . . . . . 765.3.1 Propertiesof Models . . . . . . . . . . . . . . . . . . . . 775.3.2 Requirementson ModelOrganization . . . . . . . . . . . 785.3.3 ModelHierarchies . . . . . . . . . . . . . . . . . . . . . 795.3.4 Imports . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.3.5 ControlPanelModels . . . . . . . . . . . . . . . . . . . 82

5.4 TheStandardConstructionEnvironment . . . . . . . . . . . . . . 835.4.1 Componentsof theStandardEnvironment . . . . . . . . . 845.4.2 Building Library Models . . . . . . . . . . . . . . . . . . 845.4.3 Building ApplicationModels. . . . . . . . . . . . . . . . 865.4.4 CustomizingBuilds . . . . . . . . . . . . . . . . . . . . . 88

vi

Page 9: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

5.4.5 HandlingLargeScaleSoftware . . . . . . . . . . . . . . 90

6 Function Cache 916.1 Dynamic,Fine-GrainedDependencies. . . . . . . . . . . . . . . 926.2 TheCachingProblem. . . . . . . . . . . . . . . . . . . . . . . . 936.3 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986.4 SystemRequirements. . . . . . . . . . . . . . . . . . . . . . . . 996.5 Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7 Evaluator 1067.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067.2 RepresentingDependencies. . . . . . . . . . . . . . . . . . . . . 1087.3 CachingExternalTool Invocations . . . . . . . . . . . . . . . . . 1097.4 CachingUser-DefinedFunctionEvaluations. . . . . . . . . . . . 111

7.4.1 ComputingPrimaryandSecondaryCacheKeys . . . . . . 1127.4.2 Relationto Caching . . . . . . . . . . . . . . . . . . . . 1137.4.3 Dependency Types . . . . . . . . . . . . . . . . . . . . . 1147.4.4 Dependency CalculationRules. . . . . . . . . . . . . . . 1157.4.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 1177.4.6 Correctness. . . . . . . . . . . . . . . . . . . . . . . . . 118

7.5 CachingModelEvaluationsandModelValues. . . . . . . . . . . 1197.5.1 ModelEvaluations . . . . . . . . . . . . . . . . . . . . . 1197.5.2 ModelValues . . . . . . . . . . . . . . . . . . . . . . . . 121

7.6 ExampleEvaluationCall Graphs . . . . . . . . . . . . . . . . . . 1227.6.1 ScratchBuild of theStandardEnvironment . . . . . . . . 1227.6.2 ScratchBuild of theVestaUmbrellaLibrary . . . . . . . . 1227.6.3 ScratchandIncrementalBuilds of theEvaluator. . . . . . 125

7.7 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8 Weeder 1298.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1298.2 WeedingInstructions . . . . . . . . . . . . . . . . . . . . . . . . 1318.3 Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . 131

9 Performance 1359.1 HardwareConfiguration . . . . . . . . . . . . . . . . . . . . . . 1369.2 OverallSystemPerformance. . . . . . . . . . . . . . . . . . . . 136

9.2.1 PerformanceComparisonwith Make . . . . . . . . . . . 1379.2.2 PerformanceBreakdown . . . . . . . . . . . . . . . . . . 1399.2.3 CachingAnalysis . . . . . . . . . . . . . . . . . . . . . . 140

vii

Page 10: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

9.2.4 ResourceUsage. . . . . . . . . . . . . . . . . . . . . . . 1429.3 RepositoryPerformance . . . . . . . . . . . . . . . . . . . . . . 144

9.3.1 Speedof File Operations. . . . . . . . . . . . . . . . . . 1459.3.2 Disk andMemoryConsumption. . . . . . . . . . . . . . 1489.3.3 Speedof RepositoryTools . . . . . . . . . . . . . . . . . 1519.3.4 Speedof Cross-RepositoryTools. . . . . . . . . . . . . . 1529.3.5 Speedof theReplicator. . . . . . . . . . . . . . . . . . . 154

9.4 FunctionCachePerformance. . . . . . . . . . . . . . . . . . . . 1559.4.1 ServerPerformance. . . . . . . . . . . . . . . . . . . . . 1559.4.2 StableCacheAttributes. . . . . . . . . . . . . . . . . . . 1569.4.3 ResourceUsage. . . . . . . . . . . . . . . . . . . . . . . 1579.4.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 158

9.5 WeederPerformance . . . . . . . . . . . . . . . . . . . . . . . . 1599.6 InterprocessCommunication. . . . . . . . . . . . . . . . . . . . 160

10 Conclusions 161

A The VestaSystemDescription Language 164A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164A.2 Lexical Conventions . . . . . . . . . . . . . . . . . . . . . . . . 165

A.2.1 Meta-notation. . . . . . . . . . . . . . . . . . . . . . . . 165A.2.2 Terminals . . . . . . . . . . . . . . . . . . . . . . . . . . 166

A.3 Semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166A.3.1 ValueSpace. . . . . . . . . . . . . . . . . . . . . . . . . 167A.3.2 TypeDeclarations . . . . . . . . . . . . . . . . . . . . . 168A.3.3 EvaluationRules . . . . . . . . . . . . . . . . . . . . . . 168

A.3.3.1 Expr . . . . . . . . . . . . . . . . . . . . . . . 170A.3.3.2 Literal . . . . . . . . . . . . . . . . . . . . . . 171A.3.3.3 Id . . . . . . . . . . . . . . . . . . . . . . . . . 171A.3.3.4 List . . . . . . . . . . . . . . . . . . . . . . . . 171A.3.3.5 Binding . . . . . . . . . . . . . . . . . . . . . 174A.3.3.6 Select . . . . . . . . . . . . . . . . . . . . . . 176A.3.3.7 Block . . . . . . . . . . . . . . . . . . . . . . . 177A.3.3.8 Stmt . . . . . . . . . . . . . . . . . . . . . . . 177A.3.3.9 Assign . . . . . . . . . . . . . . . . . . . . . . 178A.3.3.10 Iterate . . . . . . . . . . . . . . . . . . . . . . 178A.3.3.11 FuncDef . . . . . . . . . . . . . . . . . . . . . 179A.3.3.12 FuncCall . . . . . . . . . . . . . . . . . . . . . 182A.3.3.13 Model . . . . . . . . . . . . . . . . . . . . . . 183A.3.3.14 Files . . . . . . . . . . . . . . . . . . . . . . . 185

viii

Page 11: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

A.3.3.15 Imports. . . . . . . . . . . . . . . . . . . . . . 188A.3.3.16 FilenameInterpretation . . . . . . . . . . . . . 192

A.3.4 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . 193A.3.4.1 Functionson Typet bool . . . . . . . . . . . . 193A.3.4.2 Functionson Typet int . . . . . . . . . . . . . 194A.3.4.3 Functionson Typet text . . . . . . . . . . . . . 195A.3.4.4 Functionson Typet list . . . . . . . . . . . . . 197A.3.4.5 Functionson Typet binding . . . . . . . . . . . 199A.3.4.6 SpecialPurposeFunctions. . . . . . . . . . . . 202A.3.4.7 TypeManipulationFunctions . . . . . . . . . . 203A.3.4.8 Tool InvocationFunction . . . . . . . . . . . . 205A.3.4.9 DiagnosticFunctions . . . . . . . . . . . . . . 210

A.4 ConcreteSyntax. . . . . . . . . . . . . . . . . . . . . . . . . . . 211A.4.1 Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . 211A.4.2 Ambiguity Resolution . . . . . . . . . . . . . . . . . . . 214A.4.3 Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . 214A.4.4 ReservedIdentifiers . . . . . . . . . . . . . . . . . . . . 216

ix

Page 12: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

x

Page 13: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Chapter 1

Intr oduction

ThisreportdescribesVesta[25, 26,35], asystemthataddressestwo coreproblemsin developinglargesoftwareprojects:versioningandbuilding.

Versioningis a significantproblemfor large-scalesoftwaresystemsbecausesoftwareevolvesandchangesover time. Seriousdisparitiesoften exist betweenthesourcecodeusedto producethelastshippedversionof asoftwareproductandthe currentsourcesunderdevelopment,yet bugsneedto be fixed in both. Manydevelopersmaybeworkingonthecurrentsourcesat thesametime,yeteachneedsto testhis changesin isolation from changesmadeby others. Thus a powerfulversioningsystemis essential:developersmustbeableto create,name,track,andcontrolmany versionsof thesources.

Building is alsoa significantproblem.Without someform of automatedsup-port,thetaskof compilingorotherwiseprocessingsourcefilesandcombiningtheminto afinishedsystemwouldbetime-consuming,error-prone,andlikely to produceinconsistentresults. As a softwaresystemgrows, this taskbecomesincreasinglydifficult. No existingautomatedbuild systemis reliable,efficient,easy-to-use,andgeneralenoughto handleamulti-million line softwareprojectsatisfactorily.

Versioningand building are two partsof a larger problemareathat is oftencalled software configuration management(SCM). Consideredbroadly, SCM issometimestakento includesuchareasassoftwarelife-cycle management,processmanagement,andthe specifictools usedto develop andevolve softwarecompo-nents. We take the view that theseaspectsof SCM, althoughimportantto theoverall softwaredevelopmentprocess,aresecondaryto thecoreissuesof version-ing andbuilding. We have thereforefocusedtheVestaprojecton solvingthecoreproblems,constructinga solid baseuponwhich we believe solutionsto theotherproblemscanbebuilt.

In ourapproach,thegeneralproblemof versioningbreaksdown into two parts:

1

Page 14: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

version managementandsourcecontrol. Building breaksdown into systemmod-eling andmodelevaluation. Wenow definetheseterms.

VersionManagement.Versionmanagementis theprocessof assigningnamesto aseriesof relatedsourcefilesandsupportingretrieval of thosefilesby name.Versionmanagementcanalsoapplyto derivedfiles(filescreatedmechanicallyby thebuildsystem).Vesta’s versionmanagementsystem,however, dealsonly with sources;derivedfiles (or derivedsfor short)aremanagedentirelyby thebuild system.

SourceControl. Sourcecontrolis theprocessof controllingor regulatingthepro-duction of new versionsof sourcefiles. Operationscommonlyassociatedwithsourcecontrol include checkout and checkin, which respectively reserve a newversionname(typically a number)andsupply the datafor a previously reservedversion.Sourcecontrolmaybecoupledwith concurrency controlaswell, so thatcheckingout a particularversionlimits theability of otherusersto checkout re-latedversions.

SystemModeling. A systemmodelnamesthe softwarecomponentsthat are tobecombinedto producelargercomponentsor entiresystems,namesthetoolsthatare to be usedto combinethem, and specifieshow the tools are to be applied.Configuration descriptionis anequivalentterm.

Model Evaluation. A systemmodelcanbeviewedeitherasastaticdescriptionofa system’s configuration,or asanexecutableprogramthatdescribeshow to buildthesystem.Model evaluationmeanstakingthesecondview: runninga builder orevaluator againsta model,in orderto constructa completesystemby processingandcombiningits componentsaccordingto themodel’s instructions.

The SCM problembecomesmoredifficult as the sizeof the softwareunderdevelopmentgrows,asthenumberof developersusingtheSCMsystemincreases,asthenumberof geographicallydistributeddevelopmentsitesgrows,andasmorereleasesareproduced. To handlelarge-scale,multi-developer, multi-site, multi-releasesoftware development,we believe an SCM systemmust guaranteethatbuilds are repeatable, incremental, andconsistent. Existing SCM systemsoftenfail to provide theseproperties.

Repeatability. In anenvironmentwheremultiple versionsarebeingdevelopedinparallel,theability to exactly repeatapreviousbuild is invaluable.For example,ifacustomerreportsabug in anolderversionof a product,it is importantto beableto quickly recreatethefaulty executable,debug it, anddevelopa modifiedversionthatfixesthebug.

Repeatabilityis an easygoal to stateandto appreciate,but a difficult goal toattain. Most build systemsin usetoday do not guaranteerepeatabilitybecause

2

Page 15: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

their build resultsaredependenton someaspectof thebuilding environmentthatthesystemdoesnotcontrol.Hence,theall toocommon“it worksonmy machine”syndrome.Thefirst vital stepto achieving repeatabilityis to storesourcefiles andbuild toolsimmutablyandimmortally, sothatthey areavailablewhenneeded.Thesecondis to ensurethat systemmodelsarecomplete, recordingpreciselywhichversionsof which sourceswent into a build, which versionsof tools (suchasthecompiler)wereused,which command-lineswitchesweresuppliedto thosetools,andall otherrelevantaspectsof thebuilding environment.

Incrementality. It is crucial for performancethat the builder be incremental,reusingthe resultsof previous builds wherever possible. Without reliable incre-mentalbuilding, a developmentorganizationis forcedto performsome(if not all)of its builds from scratch. The slow turnaroundtime of suchscratchbuilds in-creasesthe time requiredfor developmentand testing. Incrementalbuilding, onthe otherhand,allows many developersto efficiently edit, build, debug, andtestdifferentpartsof thesourcebasein parallel.Evenlargeintegrationbuildsthatcom-binework from many developerscanbeacceleratedby incrementalbuilding—anycomponentsthathave alreadybeenbuilt, whetherin thelastintegrationbuild or inisolationby individual developers,arecandidatesfor reuse.

Goodperformancein the incrementalbuilder itself is alsoimportant.As soft-waresystemsgrow, evenincrementalbuilding canbetoo slow if therunningtimeof thebuilder(exclusiveof thecompilersandothertoolsit invokes)dependsonthetotal sizeof thesystemto bebuilt ratherthanthesizeof thechange.This problemcaneasilyarise;for example,a simpleincrementalbuilder might work by check-ing eachindividual tool invocationin thebuild to seewhetherit mustberedone.Ifthesecheckshave significantcost,suchabuilder will scalepoorly.

Consistency. A build is consistentif every derived file it incorporatesis up todaterelative to thesourcesfrom which it wasproduced.Oneway to achieve con-sistency is to performevery build from scratch. The potentialfor inconsistencyarises,however, if builds aredoneincrementally. In particular, if somederivedfileusedin a build is out of datewith respectto a sourcefile, to anotherderived file,or to any aspectof the build environmenton which it depends,the build will beinconsistentandhence,unsound.An inconsistentlybuilt programmayfail to linkor mayexhibit mysteriousbugsthatarenotevidentin thesourcecode.

An incrementalbuild canbe madeconsistentby recordingevery dependencyof everyderivedfile ontheenvironmentin whichit wasbuilt. This includesdepen-denciesonsourcefiles,otherderivedfiles,environmentvariables,thetoolsusedinthebuild, andthebuilding instructionsthemselves.Then,if any elementonwhichaderivedfile dependshaschanged,theneedto rebuild it canbedetected.

We now cometo our central thesis: Vestais an SCM systemthat scalesto

3

Page 16: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

large software, is easyto use, and producesrepeatable, incremental,and con-sistentbuilds. Although we candirectly demonstratethe claimsof repeatability,incrementality, andconsistency, our argumentsaboutscaleandeaseof usearein-direct. To supportthe latter points,we describethedesigndecisionswe madetoensurethat the systemwill scalewell andbe easyto use,andwe cite our expe-riencewith the Vesta-1and Vesta-2prototypes,both of which saw serioususe.We alsopresentperformancemeasurementsof our currentsystemon small andmedium-sizedbuilds,andextrapolatefrom thoseexperiments.

Therestof this reportis organizedasfollows. First,we describesomewidelyusedSCMsystemsandtheirshortcomings.Next, wegiveanoverview of theVestasystem,stressingthetechnicalfeaturesthatguaranteerepeatable,incremental,andconsistentbuilds. We continueby describingthe five major componentsof ourimplementationin detail: therepository, thesystemmodelinglanguage,thefunc-tion cache,theevaluator, andtheweeder. We thendiscusstheperformanceof ourVesta-2prototype,comparingit to traditionalSCM tools.Weconcludeby recapit-ulatingthestrengthsandweaknessesof theVestaapproach.An appendixpresentsthedetailedsyntaxandsemanticsof theVestasystemmodelinglanguage.

Themajorityof this reportwaswrittenmorethanfour yearsbeforeits eventualpublication. We have updatedit to reflectsubsequentdevelopments,but in someplacesthefour yeardelayreally shows,mostnotablyin theperformancechapter.

At thiswriting (January2002),Vestais aboutto becomeavailableasfreesoft-warerunningontheLinux operatingsystem.Theoriginal implementationranonlyunderTru64Unix on Alpha processors,but portsto bothAlpha Linux and32-bitIntel Linux have recentlybeencompleted.CompaqComputerCorporationhasap-provedthereleaseof theVestasourcecodeundertheLGPL [19], andit will soonbemadeavailablefor downloadfrom theVestawebsite[51].

4

Page 17: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Chapter 2

RelatedWork

Most software configurationmanagementsystemsin usetoday are a collectionof looselyintegratedtools. Thesetools wereinitially developedwith small-scalesystemsin mind, so problemsarisewhenthey areappliedto larger systems.Asa result,developmentorganizationstendto build uponandmodify thecoretoolsto suit their own needs.But becausethe coretools werenever designedto scalewell, becausethey werenot designedin an integratedway, andmostimportantly,becausethereareseriousflaws in their approachto theSCM problem,thesecus-tomizedsolutionsstill have problems.Sincethesetools areso widely used,it isworthconsideringsomeof themin moredetail.

PerhapsthemostcommonSCMtoolsin usetodayareRCS(theRevisionCon-trol System)[47, 48], CVS(theConcurrentVersionsSystem)[22], andMake [17].They are often usedtogether, with RCS or CVS handlingversionmanagementandsourcecontrol, andMake handlingsystemmodelingandbuilding. Later inthischapterwealsodiscusstheintegratedsystemsDSEE,ClearCASE,andVesta’spredecessorVesta-1. We concludethe chapterwith a brief survey of ideasfromothersystems.

2.1 RCS

RCSis a tool for storingmultiple versionsof individual sourcefiles. A file’s ver-sionhistorycanbranchinto anarbitrarily complex tree. RCSprovideslocking toenforcesourcecontrol,andincludestools for merging changesmadeby differentdevelopers.

RCSstoresmultiple versionsof asourcefile in a singlediskfile, in away thatavoids duplicatingmaterialthat is commonto morethanoneversion. This tech-niquesavesdiskspace,but makestheindividualversionsinconvenientto access:a

5

Page 18: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

particularversioncannotbeaccesseddirectly, but mustbeextractedinto aseparatefile first. In amoderncomputingsystem,thebenefitsof storingmultipleversionsofasourcefile compactlyareslight,sincethediskrequirementsof mostdevelopmentenvironmentsaredominatedby thespaceconsumedby derivedfiles. Moreover, theprice of disk spaceis droppingexponentially, while theprogrammerswho createsourcefilesarenot typingany faster.

In placeof RCS,someorganizationsuseSCCS(theSourceCodeControlSys-tem)[43], anolderbut essentiallysimilar system.

2.2 CVS

Onedisadvantageof both RCSandSCCSis that sourcefiles areversionedindi-vidually. Although RCSprovidesmechanismsfor tagging a groupof versionedfiles, thosemechanismsaremanualanderror-prone.CVS attemptsto remedythisproblem. CVS is a front-endto RCS that extendsthe notion of versioncontrolfrom individual files to arbitrarydirectorytreescalledmodules. In CVS, theunitof checkout is anentiremodule.Hence,it is easierto work on a groupof relatedfiles with CVS. A drawbackis thatwhenever a userchecksout a modulefor thefirst time,all thefilesarecopied,whichcanbeslow.

There is anotherimportantdifferencebetweenCVS and the other systems:CVS doesnot uselocking to enforcesourcecontrol. Instead,it usesanoptimisticconcurrency controlmodelin which eachdeveloperis free to modify (a copy of)any sourcein the centralrepositoryat any time. CVS includesa facility for me-chanicallymerging changesmadeby otherdevelopersinto one’s own sourcetree.Developerstypically applythis facility just beforecheckingin their own changes,without giving muchthoughtto whetherthe two setsof changesarecompatible.Theassumptionis thatif thechangesdonotbothaffectthesameregionof thesamefile, thereis no problem.Changesin thesameregion arereportedasconflicts,andthe developeris requiredto fix thembeforecheckingin. But if two developersmake semanticallyconflicting edits to differentfiles, or even to distinct portionsof thesamefile, suchconflictingchangesarenot reported.Despitethesedangers,somepeopleprefertheCVS approachto concurrency control.

A problemwith theCVS implementationis thatoperationson theCVS repos-itory arenot atomic: if userA is checkingin a modulewhile userB is checkingitout, B maygetsomebut notall of A’s changes.Hence,therearetime windows inwhichuserscanseeinconsistentversionsof thesameCVSmodule.1

1This problemis documentedasexisting in the currentreleaseof CVS at this writing, version1.11.1p1.

6

Page 19: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

2.3 Make

Make is a widely usedsystemmodelingand building tool. Make is driven bymakefiles, which specify the dependenciesbetweenthe componentsof a systemandprovide instructionsfor building them. When it is run, Make examinesthedependenciesandrebuilds any componentsthatareout of date.Peoplelike Makebecausethemakefile syntaxis simple(if a little cryptic), the tool is fairly easytouse,andit canbeadaptedto othertasksbesidesbuilding software.

Thebiggestproblemswith Makearethat(1) it doesnotmaintaindependenciesautomatically, and(2) many dependenciesaresimply inexpressibleor too costlyto expressin practice[18]. Becausethe dependency relation is complicatedandchangesfrequentlyover time, it is easyto specify too few or too many depen-dencies.Specifyingtoo many dependenciescanleadto unnecessarywork beingdoneduringabuild, while specifyingtoofew dependenciescanleadto inconsistentbuilds.

To alleviatethefirst problem,toolslikemakedepend[12] havebeendevelopedto computedependenciesautomatically. But makedependsuffersfrom at leasttwodeficiencies.First, it detectsonly certainkinds of dependencies,namely, depen-denciesbetweenC/C++ sourcefiles andany files directly or indirectly includedby them. Second,thereis no mechanismto run makedependautomaticallywhendependencieschange.Sincemakedependcantake a substantialamountof time torun,developerstendnot to useit asoftenasnecessary. Failureto do socanresultin inconsistentbuilds.

Therearenosolutionsto theproblemthatsomedependenciesareimpossibleortoocostlyto expressin a makefile. Evenif makedependis usedreligiously, make-files rarely if ever captureall thedependencieson theenvironment.For example,every derived file producedby a makefile dependson thebuilding instructionsinthemakefile itself, but developerstypically omit this dependency, becauseinclud-ing it would force every derived file to be rebuilt any time themakefile changed.They mustthusresortto manuallyinvalidatingor deletingthosederived files thatareaffectedby suchmakefile changes,an error-proneprocess.Otherdependen-ciesthatarecumbersomeor impossibleto specifyin Make includedependencieson theparticularversionsof tools (compilersandlinkers)used,on command-lineswitches,andon environmentvariables.

Make alsofails to scalewell. A large softwaresystemcanbe specifiedby ahierarchyof makefiles,in which onemakefile invokesMake recursively on othermakefiles. However, whenMake is invoked on the root makefile, it mustrun allthewaydown to theleavesof thedependency treeto checkfor staledependencies.BecauseMake usesfile timestampsto test whethera derived file is up to date,the testfor staledependenciesrequiresMake to determinethe last-modifiedtime

7

Page 20: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

of every source and derivedfile comprisinga system. Although someof thesetimestampsmay be cachedby the file system,they often requirenetwork round-trips to file servers and/ordisk reads. Thus, Make’s dependency checkingcanbeanextremelytime-consumingprocessfor multi-million line softwaresystems.Ironically, theMake approachrequiresall of thosedependency checksevenif theentiresystemto bebuilt is up to date!

A big problemwith Make is that it is not integratedwith any of the versionmanagementtoolsdescribedabove. Thelack of integrationmanifestsitself in twoimportantrespects.The first andmostobvious problemis that the sourcesto bebuilt mustbe explicitly checked out beforethey canbe built. SomevariantsofMake have beenhacked to do RCScheckouts on missingsourcefiles, but suchsupportis limited. The secondproblemis that explicit sourceversionsare notspecifiedin Makefiles.Putanotherway, Make providesno configurationmanage-mentsupport:it doesnot give developersa way to specifywhich sourceversionsgo together. It is thedeveloper’s responsibilityto checkout thecorrectversionsofall componentsbeforeperforminga build, a laboriousanderror-proneprocessifoneis notbuilding thelatestversion.

Anotherproblemwith Make is that it doesnot lend itself to hierarchicalde-scriptions.It is possibleto getMake to build a hierarchicallyarrangedcollectionof softwarecomponentsby invoking itself recursively, but therearesomeproblemswith doingso.Themainimpedimentsto structuringMakefilesrecursively arethatall subcomponentsmustbechecked out beforebuilding andthat thereareperfor-mancecostsassociatedwith invoking Makerecursively. Specifically, therecursionalwayshasto bedonein full—thereis nowayfor Maketo determinethataparticu-lar recursive invocationcanbeskipped,exceptby copying all its dependenciesintotheparentMakefile, which would defeatthepurposeof the recursive structuring.Thelargerthescaleof thesoftwarebeingbuilt, theworsetheseproblemsbecome.

Make’srelianceonlast-modifiedtimescanleadto inconsistentbuildsin severalways. One instanceof this problemoccurswhenbuilding an older versionof asystem.Whentheold sourceversionsarecheckedout, they mayhave timestampsthatprecedethetimestampsof thederivedfiles from themostrecentbuild. Hence,Make will concludethatthederivedfiles areup to date.Thedeveloper’s only saferecourseis to forceascratchbuild by deletingall of thederivedfiles.

Bell Laboratories’Nmake [18] addressesmostof theseproblemswith Make,aswell asothers,thoughit doesnot fully solve them. Nmake includesa built-instaticdependency generatorthatdoesits own parsingof sourcefiles (looking, forexample,for #include statementsin C code);thisgreatlyreducesthelikelihoodof omitteddependencies,but thedependenciesgeneratedcanbeoverly conserva-tive, increasingthe amountof rebuilding work needed.Moreover, new parsingsupporthasto be written whenever Nmake is usedon codewritten in a new lan-

8

Page 21: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

guage.Nmake includesanimprovedtimestampcheckingalgorithmthatfails onlywhen an erroneoussystemclock gives two different versionsof a file the sametimestamp,andit cachestimestampsandotherstateto speedup its dependencyanalysissomewhat.

2.4 DSEE

The DOMAIN SoftwareEngineeringEnvironment(DSEE)effectively addressedmany of theproblemswith Make andstandardversionmanagementsystems[31,30].

For sourcecontrol, DSEEuseda customfile systemthat allowed individualversionsof sourcefiles to be nameddirectly; no separatecheckout stepwas re-quired. However, sourcefiles werenot usually referencedusingexplicit versionnumbers.Instead,files werenamedwithout versionnumbers,andtheuser’s cur-rently selectedconfiguration threadwould bind a file nameto a particularversionof the file in the file system.A configurationthreadwasa list of rulesfor mak-ing suchassociations.For example,a configurationthreadcould specifythat thechecked-outversionof afile wouldbeusedif oneexisted,or thatthelatestversiononthemainbranchwouldbeusedotherwise.Hence,themeaningof anamecouldchangeover time. In particular, anactiontakenby onedevelopercouldchangethemeaningof a nameusedby anotherdeveloper, a dangeroussituation.Whenmanydevelopersareworking on a systemin parallel,any oneof themcould“breakthebuild”, therebyinterruptingeveryoneelse’s work.

For building,DSEEreadsystemmodelsthatenumeratedthesourcesto bebuilt,theirdependencies(e.g.,headerfiles),andthebuild rulesfor constructingthesoft-ware. DSEEprovided automaticderived file managementandthe capability forthederivedfiles producedby onebuild to bereusedin another. Unfortunately, theDSEEpapersdo not describethesystemmodelinglanguagein any detail,nor dothey discusseithertheconsistency guaranteesor theperformancecharacteristicsofbuilding with derivedreuse.

In additionto versionmanagementandsoftwareconstructionfacilities,DSEEalso includedwork flow facilities requiredby the broadersoftware engineeringprocess.For themostpart, thesefacilities wereindependentof theconfigurationmanagementfacilitiesdescribedabove,but therewasasmalldegreeof integrationbetweenthem.For example,checkingin a sourcemodulemight causea taskon atasklist to berecordedasbeingcompleted.DSEEalsoincludedafacility for spec-ifying human-sensiblesemanticdependenciesbetweensources.For example,thedependencebetweena program’s interfacecodeandits documentationmight berecordedasa semanticdependency. Whenever theprogram’s interfacewasmodi-

9

Page 22: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

fied,atechnicalwriter wouldbeinformedof theneedto updatethedocumentation.

2.5 ClearCASE

In the early 1990s,the DSEEdevelopersstarteda company to build ClearCASE[4], a commercialSCM systembasedon the DSEEphilosophyandnow sold byRationalSoftwareCorporation.

ClearCASE’s sourcecontrolmechanismis quitesimilar to DSEE’s. In Clear-CASE,theconfigurationthreadsarecalledviews, but theideais exactly thesame:rulesareusedto mapunversionedfile namesto particularsourceversions.Clear-CASE views areimplementedby a custommultiversionedfile systemthat plugsinto the operatingsystem’s file systemswitch. As in DSEE,looking up a file ina ClearCASEview requiressomeform of databaseaccess,soanextra stepis re-quiredon eachfile access.

ClearCASEdiffersfrom DSEEin two importantrespects.First,ClearCASEismoreportable.It runsonbothWindows andUnix systems.Second,ClearCASEisMake-based.Thatis,ClearCASEsystemmodelsretainthesyntaxandsemanticsofMake. However, ClearCASEincludesanalternative builder calledClearMake thatcorrectsmany of Make’sproblems.In particular, ClearMakeincludesamechanismthat,duringabuild, automaticallyrecordsthefile dependenciesandresultsof eachexternaltool invocation(e.g.,compilersandlinkers). Thesecacheddependenciesand resultsare thenusedduring subsequentbuilds to bypasstool invocationsifthespecifiedfiles areunchanged.This producesmorereliableincrementalbuildsthanstandardMake, sinceit doesnot dependfor correctnesson dependency listscreatedby a user. However, build-orderdependenciesanddependencieson filesoutsideof ClearCASE’scontrolmuststill belistedexplicitly.

Although the test for staledependencieshasbeenautomated,it is no fasterthanwith standardMake. Only invocationsof externaltoolsarecached,soaswithMake, ClearMake builds programsupwardsfrom the leaves, ratherthan down-wardsfrom theroot. As notedabove in thediscussionon Make, thisapproachhasseriousperformanceproblemsin building largesystems.2

Oneadvantageprovidedby ClearMakeoverstandardMakeis thatderivedfilesaremanagedby thesystem,andcanbeshared.Hence,developerscanbenefitfromeachother’s builds. However, heuristicsareusedto selectthe candidatederivedfiles for sharing,sotheexactcasesin which suchsharingis possiblearenot clear.It is possiblefor the heuristicsto fail to selecta valid candidatefor sharing,inwhich casean unnecessarytool invocationwill occur. In Vesta,theseeventsare

2In Vestaparlance,theClearMake strategy is analogousto cachingonly run tool calls. Weshow in Section9.2.3thatmuchbetterperformancecanbeachievedby cachinglargerunitsof work.

10

Page 23: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

termedfalsecachemisses, andthey arequiterare.Worse,ClearMake’sdependencydetectionis incomplete,soit cansometimesproduceinconsistentbuilds. In Vesta,dependency detectionis automaticandcomplete,and inconsistentbuilds do notoccur.

TheClearCASEMultiSite productsupportsreplicationof ClearCASEsourcerepositoriesacrossmultiple,geographicallydistributedsites[3]. Vesta’s approachto replication(Section4.4)differssharplyfrom thatof ClearCASE.In ClearCASE,thechoiceof whatto replicateis madeat a coarsegrain—anentireVersionedOb-ject Base,of which thereare typically only oneor a few per site. In Vesta,onecanchoosewhat to replicatedown to the level of individual versionsof softwarepackages,if desired. ClearCASEreplicasexhibit eventualconsistency; that is,an updatealgorithmis usedthat would eventuallymake the replicasidentical ifthey all were to stopchangingfor sufficiently long, but thereareno clearguar-anteeson what differencescan exist betweenreplicaswhen changeshave beenmaderecently, andan unchangingbackupcopy of old versionsis not consistentwith currentreplicasin any usefulsense.Vestadefinesa simple,flexible notionof consistency for its replicasthat takesadvantageof thefact thatsourcesareim-mutableoncethey have beenaddedto therepository. ClearCASE’s replicaupdatealgorithm is operation-basedand requiresknowledgeof the full set of replicas;that is, eachClearCasereplicamustkeepa historyof recentoperationsthathavechangedit andkeeptrackof which changeshave not yet beenpropagatedto eachotherreplica. Vesta’s algorithmis state-basedandworkswhenthesetof replicasis unknown andchanging;the replicationtool simply comparesthestatesof tworeplicas,copying any datafrom thefirst that is missinganddesiredin thesecond.TheClearCASEapproachhassomeadvantages;in particular, feweradministrativedecisionsarerequiredasto whatto replicate,andtheoperation-basedapproachtoupdatesshouldscalebetterwhenreplicasshareagreatdealof databut very little ischanging.However, theVestaapproachis muchsimpler, providesaclearlydefinedlevel of consistency, supportsusagepatternswherethe replicasaremorelooselycoupled,andhasperformedwell in ourexperience.

2.6 Vesta-1

The Vesta-2systemdescribedin this report is a follow-on to the earlierVesta-1work [11, 13, 24, 32]. BothVesta-1andVesta-2werebasedontheideaof buildingfrom immutablesourcefiles usingcompletebuild descriptions.As a result,Vesta-1 sharedVesta-2’s benefitsof producingrepeatable,incremental,andconsistentbuilds of large-scalesoftware.

Vesta-1saw extensive useat our lab for a period of about15 monthsby a

11

Page 24: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

groupof 25 programmers.It wasusedto build a sophisticatedexperimentalsoft-wareenvironment,whichincludedacustomoperatingsystem,afile server, anovelcompiling system,an experimentalmulti-threadedwindow system,a numberofmathematical,graphical,andsymbol-manipulationlibraries,andaspectrumof ap-plications,includinga text editor, agraphicaleditor, ray-tracingsoftware,analgo-rithm animationsystem,somemathematicaltools,andVesta-1itself. In aggregate,this codebasecomprised1.4 million sourcelines. Most of thecomponentswerebuilt with the experimentalcompiling system(which wascontinuallyevolving),andweretargetedfor two differentplatforms:a VAX multiprocessorrunningourin-houseoperatingsystem,anda MIPS uniprocessorrunningUnix. In short,thisworkloadprovided a serioustestof the Vestaapproachon a scalecomparabletomany real-world developmentprojects.

Although Vesta-1demonstratedthepromiseof theVestaapproach,like mostexperimentalprototypesit sufferedfrom severaldesignandimplementationflaws.Vesta-2wasdesignedandimplementedto addresstheseproblems. In particular,Vesta-2’s cachingis morecomprehensive thanVesta-1’s, its systemmodelinglan-guageis simpler, its performanceis better, its repositoryis moregeneral,anditsimplementationis moreportable.

Throughoutthis report,thenameVestarefersto Vesta-2.Wherea distinctionwith Vesta-1is required,thenamesVesta-1andVesta-2areusedexplicitly.

2.7 Other Systems

Wehavechosentodiscussonlyafew softwareconfigurationmanagementtoolsandsystemsin thischapter, but numerousothersaredescribedin theresearchliterature,availablecommercially, or availableasopensource[2, 5, 8, 14, 33, 34, 37, 41, 46,49, 50]. Many of thesesystemsdo their versionmanagementandsourcecontrolusing paradigmsvery similar to that of RCS or CVS, and a greatmany handlesystemmodelingandbuilding usingversionsof Makewith variousimprovements.Thus, althoughRCS,CVS, and Make are now very old tools, they are still alltoo representative of thestateof the industry. Of course,thecommercialsystemsoften have sophisticatedfeaturesfor bug tracking,workflow management,high-level projectdependencies,andgraphicaluserinterfacesthatgo beyondthescopeof whatwe have attemptedin Vesta,while the researchandopensourcesystemsexploreawide varietyof ideas.

Somesystemsusealternative approachesto versionmanagement.Severalsys-temsnameandmanagethesetsof changes in a softwareartifact ratherthancom-pleteversionsof theartifact itself [8, 37, 49]. Onemotivation in suchsystemsistheideathatuserscansynthesizemany differentversionsof anartifactby mixing

12

Page 25: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

andmatchingvariouschangesets.We areskepticalaboutthis idea,sincechangesfrom differentsetsseemquite likely to conflict with oneanother. Somesystemsversioneveryobjectin ahierarchicaldirectorytree,includingthedirectoriesthem-selves[33, 34, 50], while othershave evenmorecomplex modelsfor versionman-agement[16]. Theseapproachesseeminteresting,but we did not seetheneedforthemandthereforechosea moreconservative modelof versionmanagementforVesta.

13

Page 26: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Chapter 3

SystemOverview

In thischapter, wedescribeVesta’s maincomponents,thefoundationsof theVestaapproach,andthesystem’s designparameters.

3.1 SystemComponents

Figure3.1showsthemajorcomponentsof theVestaimplementation.In thefigure,componentsthatarehighly visibleto theordinaryuserappeartowardtheleft; thosethataremostlyhiddenwithin theimplementationor visible only to administratorsappeartowardtheright. Thecomponentsin thebottomrow aresharedservers;ateachVestainstallation,or site, thereis exactly oneinstanceof each. In contrast,thecomponentsin thetop row canrunon any usermachine.

We now give a brief overview of eachcomponent,thendescribein a bit moredetailhow thecomponentsinteract.

The repositoryserver(lower left) is responsiblefor long-termdatastorageinfilesanddirectories.Versioned,immutablesources1 arestoredin immutablesourcedirectories. Developersusemutableworking directorieswhile editingsourcestocreatenew sourceversions.Vesta’s build processusestemporary build directorieswhile running tools like compilersand linkers to createnew derived files; thesedirectoriesarenot directly visible to users.The immutablesourcedirectoriesandthemutableworking directoriesaretypically mountedas/vesta and/vesta-work , respectively.

Standard file browsingand editing tools are not a part of Vestaproper, butwe includethemin thediagramto stressthatdeveloperscanuseall their familiar

1By source, we meanany file storedin theVestarepositorythat is not derivedmechanicallybythe Vestabuilder. Hence,even files suchasbuild tools andlibrary archivescopiedinto the Vestarepositoryareconsideredto besources.

14

Page 27: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Standardfile browsingand editingtools

Repositorytools(checkout,checkin,etc.)

Evaluator

Standardbuild tools(compilers,linkers, etc.)

Weeder

Immutablesourcedirectories

Mutableworkingdirectories

Temporarybuilddirectories

Functioncacheentries

Repository server Function cache server

Runtool server

Figure3.1: Themajorcomponentsof theVestaimplementation.

text browsing,comparing,editing,andprocessingtoolson files in the repository.Developersinvoke therepositorytoolsto createnew sourceversionsor to dootheroperationson the repositorythat do not fit directly into the standardfile systemparadigm.

Theevaluator is Vesta’s builder; it evaluatessystemmodelswritten in Vesta’ssystemdescriptionlanguageto constructcompletesoftware systemsfrom theircomponents.Theevaluatormakesuseof the runtool serverto run standardbuildtools like compilersandlinkerswhenneeded;it makesuseof the functioncacheserverto storeintermediateandfinal resultsof eachbuild for laterreuse.

Finally, theweederis a garbagecollectorfor long-termstorage;it triggersthedeletionof cacheentriesandderivedfiles whenanadministratordeclaresthey areno longerneeded.

3.1.1 SourceControl Components

Figure3.2 highlightsVesta’s sourcecontrol componentsandshows how they in-teract.Chapter4 describesthesecomponentsin detail.

The Vestarepositorystoresimmutablesourcesin a hierarchicalnamespace,similar to a Unix or Windows directory tree. Every versionof every sourceisincludedin the tree;by convention,differentversionsof thesamesourcearedis-

15

Page 28: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Standardfile browsingand editingtools

Repositorytools(checkout,checkin,etc.)

Evaluator

Standardbuild tools(compilers,linkers, etc.)

Weeder

Functioncacheentries

Function cache server

Runtool server

Immutablesourcedirectories

Mutableworkingdirectories

Temporarybuilddirectories

Repository server

OrdinaryNFS access

SpecialVesta primitives

Figure3.2: Sourcecontrolcomponentsandtheir interactions.

tinguishedby having a versionnameor numberasonepathnamecomponent.Asshown, therepositorymakesthis treeavailableasanetwork-accessiblefile system,using the standardNFS protocol [44]. Thus,ordinaryfile browsing andeditingtoolsrunningonany userworkstationthatsupportsNFScanaccessall versionsofall sourcesdirectly.

In therepository, sourcesareconventionallyorganizedin packages. A packageis a collection of relatedfiles, suchas the sourcesto build a single programorlibrary. By convention,Vestasourcesareversionedat thepackagelevel, notat thelevel of individual files. Thusa versionof apackageconsistsof a directorytreeofrelatedfiles. Contrastthiswith themoreconventional(RCS)methodof versioningeverysourcefile, whichprovidesnonaturalmethodfor identifyingwhichversionsgo together.

Vestausesa checkout-checkinsourcecontrolparadigm,but theprocessworksin aslightly unusualway. Becausesourcefilesareimmutable,acheckinoperationneverdeletesexistingfilesor renderstheminaccessible.Instead,checkout-checkinoperationsaddto thenamespaceof packageversions.Checkingoutapackagere-serves a versionnameand makes a mutableworking copy of the existing filesandsubdirectoriesfrom the package’s previous version(if any). Standardtools

16

Page 29: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

canthenbe usedto modify, create,delete,or renamefiles anddirectoriesin theworkingcopy. To ensurethatbuilds arereproducible,thebuilder operatesonly onimmutablesnapshotsof theworking copy, not on theworking copy itself. Check-ing in thepackagebindsthepreviously reservedversionnameto thefinal snapshotof theworking copy. Checkin,snapshotting,checkout, andotherrepositoryoper-ationsthat do not fit the NFS file accessparadigmarehandledby the repositorytools;asshown in Figure3.2,thetoolswork by invoking specialrepositoryprimi-tivesthroughanRPC(remoteprocedurecall) interface.

To supportdevelopmentof softwareacrossgeographicallydistributedsites,therepositoryserveratonesitecanreplicatesomeor all of its sourcesfrom repositoryserversat othersites. Vesta’s supportfor suchpartial replication is describedinSection4.4.To achievepartialreplication,repositoryserversatdifferentsitescom-municatewith eachotherthroughtheRPCinterface(not shown in thefigure).

3.1.2 Build Components

Figure3.3 highlights the componentsthat participatein Vestabuilds andshowshow they interact. Building is quite a complex process,involving many compo-nentsthatinteractin subtlewaysto ensurethatbuilds arereproducible,incremen-tal, andconsistent.

The Vestaevaluator is the centerof the build process. The evaluatorreadsa systemmodelandactson it, building what the modeldescribes.As shown inFigure3.3(arrow 1), modelsarealwaysreadfrom theimmutablepartof therepos-itory, to helpensurethatbuilds arealwaysreproducible.A modeldescribeshowto build a softwareartifact from source,andthesourcesit refersto arealsostoredin the immutablepart of the repository. Modelsarewritten in the Vestasystemdescriptionlanguage(SDL), asmallfunctionalprogramminglanguagewhosedatatypesandprimitivesarespecializedfor softwareconstruction.Chapter5 describesthelanguagein general,AppendixA givesits completesyntaxandsemantics,andChapter7 describestheevaluator.

Whenever the evaluatorencountersa function call in a model,it looks in thefunctioncache(arrow 2) to seeif asufficiently similar call hasalreadybeenevalu-atedin apreviousbuild. If so,it readstheresultfrom thecacheinsteadof evaluat-ing thefunctionagain.Functioncachehits canoccuratany level in thecall graph,from theleaves(usuallyindividual callsto astandardbuild tool suchasacompileror linker)upto theroot(theentirebuild beingrequested).UnlikeVesta,mostotherbuild systemscacheonly at the leaves,andthusdo not scalewell to largebuilds.Chapter6 describesthefunctioncache.

Whatdoesit meanfor a previous functioncall to besufficientlysimilar to thecurrentone? In more detail, we needthe samefunction to have beencalled in

17

Page 30: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Standardfile browsingand editingtools

Repositorytools(checkout,checkin,etc.)

WeederEvaluator

Standardbuild tools(compilers,linkers, etc.)

Immutablesourcedirectories

Mutableworkingdirectories

Temporarybuilddirectories

Functioncacheentries

Repository server Function cache server

Runtool server

1) Read models andsource directories

2) Look up calls6) Cache results

3) Invoke tools

4) Read sources,read/write deriveds

5) Accessdirectories

Figure3.3: Componentinteractionsduringa build.

a sufficiently similar namingenvironment, including both argumentsandnamesboundfrom thefunction’s staticscope,thattheresultis guaranteedto bethesame.It would of coursebe correctto checkthat all namesin the environmentarethesame,but doingsowouldgiveusanunacceptablylow cachehit rate.For example,whenthe C compiler is invoked, all of the .h files in /usr/includearein its envi-ronment,but a typical .c file usesonly a few of these.h files. The resultof thecompilationdoesnot dependon the whole environment,only on the part that isactuallyreferenced.

Therefore,whenVestaevaluatesafunction,it recordsthedynamicfine-graineddependenciesof the function’s result on its namingenvironment. By dynamic,we meanthat theevaluatorrecordsonly what is referencedduring this particularevaluation.By fine-grained, we meanthatwhenonly partof a compositevalueisreferenced,theevaluatorrecordsa dependency on just thatpart,not on thewholevalue. In theC example,if a particular.h file is not used,no dependency on it isrecorded.Oncachelookups,then,ahit occurswheneverwecanfind acacheentry

18

Page 31: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

for thecurrentfunctionwhosedependencieswereboundto thesamevaluesin boththeentry’s original environmentandthecurrentenvironment.Vesta’s dependencyanalysisdoesnotmakeuseof any knowledgeof how thebuild toolswork; it is thussemantics-independent in theterminologyof Gunter[23].

When a call to a build tool missesin the cache,the evaluatormust invokethe tool itself to do the work. It doesso by making a remoteprocedurecall tothe runtool server (arrow 3), which is responsiblefor startingthe tool, waitingfor it to complete,andreportingits outcomebackto theevaluator. We placethisfunctionality in a separateserver so that theevaluatorcaninvoke toolson remotemachines.This lets us supportparallelcompilation(by invoking runtool serverson multiple machines),andcross-platformdevelopment(by invoking the runtoolserver on a machinewith a differentarchitecturefrom the local machine,whenacross-compileris notavailable).

Build tools are run in an encapsulatedenvironment. That is, Vestacontrolsnot only the tool’s commandline andenvironmentvariables,but also the entirefile systemcontentthat the tool sees.Moreover, Vestamonitorsandrecordseachfile systemreferencethat thetool makes.We accomplishthis by forcing all of thetool’s file anddirectoryreferencesto go throughtherepository(arrow 4).2

The files anddirectoriesthat an encapsulatedtool seesaredefinedasa datastructurewithin the ongoingmodelevaluation,but it is the repositorythat man-ifests this datastructureto the tool, asa treeof temporarybuild directories(seefigure).Theevaluatordoesnotpassthedatastructureto therepositoryall at once;instead,whenever thetool makesa directoryreferencethat therepositoryhasnotseenbefore,therepositorycallsbackto theevaluator(arrow 5) to obtainthecor-respondingvalue,andtheevaluatornotesthereferenceasa dependency. Thustheevaluatoris ableto recordfine-graineddependenciesnotonly for functionswrittenentirelyin theVestaSDL, but alsofor tool invocations.At theendof a tool execu-tion, therepositoryreportsto theevaluatorwhatnew files anddirectoriesthetoolcreated(andany otherchangesthetool made)asits output.Section4.2.2describestherepositorymachineryfor tool encapsulationin moredetail.

After the evaluatorfinishesexecutinga function, it writes a new cacheentry(arrow 6) to recordthefunctionresultandits dependencies.Vestausesapersistent,sharedcacheserver sothatabuild donetodaycanbenefitfrom work alreadydonein thepast,andsothatabuild requestedby oneusercanbenefitfrom work alreadydoneon behalfof anotheruser.

As a final step,not shown in the figure, theevaluatorcanoptionally ship theresultsof thebuild. That is, it cancopy out someor all of theresultsof theevalu-

2Under Unix, we usethe chroot systemcall to redefinethe tool’s root directory (“/”), thusensuringthatit canreferenceonly filesanddirectoriessuppliedby therepository.

19

Page 32: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Standardfile browsingand editingtools

Repositorytools(checkout,checkin,etc.)

Evaluator

Standardbuild tools(compilers,linkers, etc.)

Runtool server

Weeder

Immutablesourcedirectories

Mutableworkingdirectories

Temporarybuilddirectories

Functioncacheentries

Repository server Function cache server

List ofbuildsto keep

3) Scan directories4) Delete source and derived files

1) Scan cache2) Delete entries

Directorystorage

Source and derivedfile storage

Entrystorage

Figure3.4: Disk storageandtheweeder.

ation’s top-level functioncall to permanentfiles anddirectories.Notethateveniftheresultsof abuild arenot shipped,they remainin thefunctioncacheandcanbequickly retrievedby repeatingthebuild.

3.1.3 StorageComponents

Figure3.4 shows the threepoolsof long-termdisk storageusedby Vestacompo-nentsandillustratestheoperationof theweeder, anadministrative tool for reclaim-ing storagethatis no longerneeded.

As shown at thebottomof thefigure,therepositoryhasa privatestorageareafor directoryentriesand the function cachehasa privateareafor cacheentries,but they sharea commonpool of storagefor sourceandderived files. This poolis managedusinggarbagecollection: whenneithera sourcedirectoryentry nor

20

Page 33: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

a functioncacheentry pointsto a file, it canbe deleted.At different timesin itshistory, thesamefile canbepointedto by a directoryentry, acacheentry, or both.Section4.2.1describesthefile pool in moredetail.

As buildsareperformed,cacheentriesandderivedfilesaccumulatein thefunc-tion cacheandfile pool, andcaneventuallygrow to fill the availabledisk space.Fortunately, any or all cacheentriescanbedeletedwithoutaffectingtherepeatabil-ity or consistency of builds; however, somefuturebuilds thatwould otherwisegetcachehits (andhencebeincremental)mayhave to bedonepartly or entirelyfromscratch.Decidingwhich entriesarebestto remove andwhich shouldbe retainedis a task that cannotbe entirely automated;usersandVestaadministratorsmustdecidewhichonesareworthkeeping,basedon theirknowledgeof whatbuilds arelikely to berequestedin thefuture.

Unwantedcacheentriesandderived files aredeletedby theVestaweeder, anadministrative programthat is run periodically. Given a specificationof whichpackagebuilds to keep,it deletesall cacheentriesthatdid not participatein thosebuilds (steps1 and2 in the figure). It thencontactsthe repository(steps3 and4) anddeletesall files in thesharedpool thatareneitherpointedto by remainingcacheentriesnor pointedto by the repositorydirectorystructure.Sinceweedingcantake a relatively long time (minutesto hours),the functioncache,repository,andweederhave beendesignedso that the weedercanbe run concurrentlywithclientbuilds without adverselyaffectingnormalbuild performance.Theweederisdescribedin Chapter8.

3.1.4 Modelsand Modularity

Vestasystemmodelsaremodular—that is, eachmodelcanimport othermodelsandusethefunctionsthey define.A modelcandescribehow to build a collectionof sourcesinto asubsystem,thenexport thatsubsystemasaunit, for useby higher-level modelsthatassemblethesubsystemsinto acompletesystem.

Modularity is essentialfor scalabilityto largesystems,but it is alsoimportanteven for building small programs. In today’s programmingenvironments,evena small “hello world” programis compiledandlinked againsta large runtimeli-braryof input/outputandoperatingsysteminterfaceroutines.Moreover, thecom-mandsneededto invoke tools like compilers,linkers,stubgenerators,andthelikecanbe complex. Vestaprovidesa standard environmentmodelthat encapsulatestheselibraries and commonbuilding actionsand makes them available to user-written modelsin a simpleform. Thestandardenvironmentcanbequitecomplexinternally—forexample,it canbuild someor all of thestandardlibrariesandtoolsfrom source—withoutexposingany of thesecomplexities to theordinaryuser. Atthe sametime, becauseit is written asa model ratherthanbeinghardwiredinto

21

Page 34: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Vesta’s implementation,moreadvanceduserscanextend,modify, or replacethestandardenvironmentto suit theiruniqueneeds.

Modelsfor ordinaryapplicationsaretypically written so that they canbe im-portedtoo. Onecan thenwrite a releasemodelthat importsoneversionof thestandardenvironmentandbuildsawholecollectionof applicationsagainstit. Thusa developmentshopthatmaintainsa suiteof applicationscaneasilycreatea con-sistentrelease,in which all theapplicationsareknown to have beenbuilt with thesametoolsagainstthesamelibraries.

Thesystemmodelinglanguage,systemmodels,andthestandardenvironmentaredescribedmorefully in Chapter5.

3.2 Foundations

WehaveclaimedthatVestaprovidesrepeatability, consistency, incrementality, andscalabilityin softwareversionmanagementandbuilding. Thefoundationfor theseclaimslies in Vesta’s novel combinationof structuralfeatures.Now thatwe havegiven an overview of thesefeatures,we arepreparedto summarizethe essentialcontributionsof eachoneto Vesta’s overall goals.

Immutable, Immortal, VersionedSources. All Vestabuilds areperformedonimmutable,immortal, versionedsources.Beforeeachbuild, a developer’s muta-ble sourcesarecopiedto an immutable,versionedform. The immutability andimmortalityof sourcesareessentialfor achieving repeatablebuilds.

Complete,Source-BasedBuild Descriptions. A Vestabuild descriptiongivesacompleterecipefor building a softwareartifact from versionedsources.By com-pletewe meanthat no aspectof the build relieson any aspectof the computingenvironmentoutsideof Vesta’s control, including environmentvariables,libraryarchives,andbuild tools. Thecompletenatureof Vesta’s build descriptionsis es-sentialfor achieving consistentbuilds. Becausebuild descriptionsnameversionedsources,themeaningof anamecannotchangeover time.

Automatic DependencyDetection. A build systemthat aspiresto performcon-sistentbuilds must know the dependenciesof every derived file. Vestadetectsand recordsall suchdependenciesautomatically. By using automaticdetectionratherthanrelying on user-supplied(andthuserror-prone)dependency specifica-tions,Vestacancollectall theinformationneededto determinewhen(re)buildingis necessary, andcantherebyensurethattheresultsof its builds areconsistent.

Caching. To supportincrementalbuilding, Vestaautomaticallycachestheresultsof tool invocations(andtheir associateddependencies)for later re-use.For scala-bility, largerunitsof work — suchastheconstructionof entirelibraries— arealso

22

Page 35: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

cached.BecauseVestausesa sharedsite-widecache,eachdevelopercanbenefitfrom thebuilds of all others.

Hierarchical Build Descriptions. The Vestasystemmodelinglanguageallowsbuild descriptionsto bestructuredhierarchically. Hierarchical,modularstructuringis theonly effective way we know to describelarge-scalesoftwareconfigurations.It also permits re-useof commonbuild functionality, suchas that provided byVesta’s standardconstructionenvironment.

SourceReplication. As softwaregrows larger andaslong-distancetelecommut-ing becomesmorecommon,theneedfor developmentof softwaresystemsacrossmultiple, geographically-distributed sitesincreases.However, groupsat differentsitesmay needto shareonly someof their sources.Vestaallows someor all ofthesourcesatonesiteto bereplicatedatothersites,usinganalgorithmthatavoidscopying sourcesunnecessarily. Moreover, controlof eachpackagecanbeshiftedto the site developing it most actively, allowing usersat that site to createnewversionsautonomously.

3.3 DesignTargets

BeforedesigningandimplementingVesta-2,weneededto decidehow largeasoft-waresystemwe expectedit to be able to handle. That decisionhad immediateconsequenceson theloadimposedoneachof thesystem’s componentsandon thesystem’s overall resourceusage.Oncechosen,thesetargetsalsoplayeda criticalrole in oursubsequentengineeringdecisions.

The Vesta-2designtargetsgiven below were basedon our experiencewithVesta-1,whichhadbeenusedto build anactively changingcodebaseof 1.4millionsourcelines. Our goal for Vesta-2wasthat it build softwaresystemsat leastanorderof magnitudelargerthanthosebuilt with Vesta-1.

CodeSize.Vesta-2shouldbeableto build asystemcomprisingapproximately20million linesof sourcecode.

Derived Files. Thereshouldbeapproximately1 million derivedfilesof interestateachsite.Hence,thesystemshouldbeableto accommodate2 million derivedfilescomfortably(beforeweedingbecomesnecessary).

CacheEntries. We expect 5–6 cacheentriesfor eachderived file. Hence,thefunctioncachemustsupport10–12million cacheentries.

The implementationshouldbe able to comfortablyaccommodatea softwaresystemof this size. Moreover, whenusedto build somewhat larger systems,the

23

Page 36: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

systemshouldcontinueto work; it shouldnot suffer any majorperformanceprob-lems. Our intent in settingthesetargetswasnot to placean upperboundon thesizeof systemVestacouldhandle,but a lower bound.Thetargetsremindedusto“think big” throughoutthedesignprocess.

Althoughwe did not have theopportunityto testtheVestaimplementationonasoftwaresystemof thissize,wewereableto applyit to largeenoughproblemstopartly validateourdesigndecisions.Examplesof designdecisionsthatweremadewith aneyetowardscalabilityappearthroughoutthis report.Wepresenttheresultsof ourperformancetestingin Chapter9,andwebriefly discussourexperiencewithrealusersin Chapter10.

24

Page 37: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Chapter 4

Repository

The Vestarepositoryis responsiblefor long term storageof sourceand derivedfiles. Therepositoryprovidestwo distinctsetsof servicesto two kindsof clients:usersandtheevaluator. Usersaremainly concernedwith sources;they readexist-ing sourcefilesandcreatenew ones.Theevaluatoris concernedwith bothsourcesandderiveds; it readssystemmodels(which aresources),and it runs tools thatreadsourcesandderivedsandwrite out new deriveds.Thesourcestorageserviceandits replicationsupportaresufficiently comprehensive andindependentthattherepositorycouldbeusefulalone,without theevaluatoror therestof Vesta.

Therepositoryis implementedin two layers.The repositoryserveris respon-sible for sourcenamingandstorage,while the repositorytoolsprovide a userin-terfaceto theserver. Thedistinctionbetweentheserverandthetoolsis significant.Thetoolswe have built supporta particularstyleof namingsourcesandcarryingoutcommondevelopmenttasks,but therepositoryserver itself is quitegeneralandcould supportotherstylesof use. Most of the complexity of thesystemis in therepositoryserver. Theexisting repositorytoolsareshortcommand-lineprograms,eachlessthan1000linesof code.

In thenext threesections,we discussthemainfeaturesof therepositoryfromtheuser’spointof view, from theevaluator’spointof view, andfrom theimplemen-tor’s point of view. In thefinal sectionof thechapter, we discusstherepository’ssupportfor sourcereplicationandthereplicationtoolswe have built.

4.1 The User’sView

Thissectiondescribestheaspectsof therepositorythataredirectlyvisibleto users.

25

Page 38: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

4.1.1 SourceNamespace

TheVestarepositoryprovidesa hierarchicalsourcenamespace,similar to a Unixor Windows directory tree,but with a few additionalfeaturesandrestrictionstosupportconfigurationmanagement.

For userconvenience,Vestamakesthesourcenamespaceavailableasasubtreeof the file namespaceon eachclient machine.Userscanapply all their existing,familiar toolsfor browsingordinaryfiles anddirectoriesto repositoryfiles anddi-rectories.Therepositoryserver makesthispossibleby exportingthesourcetreeasanNFSvolumethatcanbeimportedandmountedby client machines.Theserveralsoexportstwo remoteprocedurecall (RPC)interfacesfor accessto featuresthatdo notmapwell ontoNFS.

To supportVesta’s goal of repeatablebuilds, all sourcefiles accessibleto theevaluatorareimmutable. Onceafile is createdandplacedin thesourcenamespace,thefile’s contentscannotbemodified. In addition,Vesta’s philosophyof softwaredevelopmentsaysthatsourcecodeoughtto beimmortal; all sourcesshouldbekeptpermanentlysothatbuildscanalwaysbereproduced.Werecognize,however, thatsourcessometimesmustbe deletedfor practicalreasons,suchasa lack of diskspaceor theexpirationof a licensingagreement.Sowe do allow sourcefiles anddirectoriesto be deleted,but to preserve consistency in builds, we do not allowa deletedsource’s full pathnameto be reusedlater for a differentsource. Thus,repeatinga build will eithersucceedandproducean identical result,or will failbecausesomesourcefile hasbeenexplicitly deleted.Thelattercasewill notoccurif usersfollow ourdevelopmentphilosophy.

Vestasourcedirectoriesthusmustnotbearbitrarilymutable.Vestain factsup-portstwo kindsof directorywith limited mutability: immutableandappendable.

Like an immutablefile, an immutabledirectorycannotbe changedonceit ispopulatedwith filesandsubdirectoriesandplacedin thesourcenamespace.Everyfile in suchadirectorymustbeimmutable,andsomustevery subdirectory, sothattheentiretreerootedat it canbetreatedasoneimmutableunit.

AppendabledirectoriessupportandenforceVesta’s rule thatnamescannotbereused.An appendabledirectoryis similar to anappend-onlyfile. New namescanbecreatedin thedirectory, but existingnamescannotbeunboundor freelyreboundto differentcontents.

We do allow certainstrictly limited forms of rebinding,involving specialob-jects called ghostsand stubs. If the Vestaevaluatorencountersa ghostor stubduringa build, it haltswith anerrormessage,doesnot producea result,anddoesnot recordtheerrorin theVestafunctioncache.Thereforethesecasesof rebindingcannotcausethesamebuild to havedifferentresultsondifferentoccasions;aswithdeletion,they canat worstcausea build thatsucceededon oneoccasionto fail on

26

Page 39: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

another.GhostssupportVesta’sdeletionsemantics.Whenauserdeletesanobjectfrom

anappendabledirectory, Vestareplacestheobjectwith a ghost. Theghostkeepstheold namefrom beingreused;Vestadoesnot allow a nameboundto a ghosttobereboundto anew object,andattemptingto deleteaghosthasnoeffect.

Stubssupportmorespecializedfeaturesof Vesta—namereservationandpartialreplication—whichwedescribein Sections4.1.4and4.4below. A stubis aplace-holderfor a file or directorythatmaybesuppliedin thefuture. It is permissibletoreplaceanexistingfile or directorywith astub,aslongasit is guaranteedthatonlythesamefile or directorywill laterreplacethestub.

BothghostsandstubsaremanifestedthroughtheNFSinterfaceaszero-lengthfiles thatcannotbereador written. Their accesspermissionbits aresetto distinc-tivevaluesthatmake themdistinguishablefrom realfiles in adirectorylisting.

4.1.2 Versioning

How can an append-onlynamespaceof immutablefiles and directoriessupportsoftwaredevelopment?Softwaresystemsthatareunderactive developmentcon-stantlygrow andchangethroughmodificationof thefiles they contain.Thesolu-tion, of course,is to adoptanamingconventionin whicheachpathnameincludesaversionnumber. Theninsteadof modifyingafile or directoryin place,onecreatesanew version.

Therepositoryservermakesno judgmentasto whatpartof apathnameshouldbe the versionnumber, but for several reasons,we have found versioningat thelevel of directorytreesto beconvenient,andhave designedtherepositorytoolstouseit.

Most programsconsistof several files that make up a logical unit, which wecall apackage. At minimumapackageincludesonefile of sourcecodeandoneofbuilding instructions.A packagecanalsobelarger, consistingtypically of severalcloselyrelatedfilesof code,interfaces,anddocumentation,perhapsorganizedin atreeof subdirectories.Large programscangenerallybe decomposedinto severalpackages,eachrelatively independentof theothers.It is convenientto storeeachpackageasaseparatedirectoryor directorytree.

Whena packageis modified,oftenseveralfiles in it mustchangetogetherforconsistency. In systemslike CVS, whereevery file is versionedseparately, a sep-aratedatastructureandtools arerequiredto keeptrack of which versionsof thefiles in a packagego togetherto make a coherentversionof the whole package.In Vesta,eachcoherentpackageversionsimply correspondsto oneimmutabledi-rectorywith onehierarchicalfile name,with theversionnumberasa componentof thename.For example,thedirectoriesthread/5 andthread/10 would be

27

Page 40: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

versions5 and10 of the threadpackage,and the files thread/5/foo.c andthread/5/foo.h would be consistentversionsof its componentsfoo.c andfoo.h. If foo.c is unchangedbetweentwo versionsof a package,the repositorylinks thesameimmutablefile into bothversiondirectories;only onecopy existsondisk.

All packageversionsare directly available for browsing and building at alltimes. A usercan easily seewhat is in a particularversionby changingto itsdirectoryandlooking around,andcaneasilycomparetwo versionswith existingtools like diff . Thereis no needto checkout a privatecopy of a packageunlessitis to bemodified.

A secondaryreasonfor versioningat the level of directoriesis that existingUnix and Windows tools are not preparedto deal with versionnumbersin filenames. Vesta“hides” the numbersfrom them by placing them earlier than thelast componentin the pathname.For example,a Unix or Windows C compileris betteradaptedto dealwith file namesof the form 2/foo.c than foo.c/2or foo.c;2 . The repositoryserver could supportany of thesethreeversioningschemes,asit allows immutablefiles to beplaceddirectly in appendabledirecto-ries,but therepositorytoolssupportonly thefirst.

Of course,whena largesoftwaresystemis madeup of severalseparatelyver-sionedpackages,we still needa way to keeptrackof which versionsof thepack-agesgo togetherto make acoherentsystem.Ratherthantrying to solve thatprob-lemwithin therepository, Vestadealswith it in thesystemmodelinglanguage;seeSection5.3.3. Onemight arguethat if a mechanismfor namingpackageversionsis requiredin the modelinglanguage,we might just aswell usethat mechanismto nameindividual file versionsanddispensewith package-level versioningin therepository. But althoughthis approachwould betechnicallyfeasible,it would befar lessconvenientfor usersthantheapproachwehave chosen—modelswouldbeclutteredwith large numbersof versionedfile namesinsteadof containingonly afew versionedpackagenames.

4.1.3 Naming Convention

A hierarchicalnamespacegivesgreatfreedomin assigningnamesto files. To avoidchaos,oneneedsto organizethenamespace:to establishconventionson how filesarenamedsothatpeoplecanfind them.Vesta’s repositoryserver doesnotenforceany particularnamingconvention,but the repositorytools do. Figure4.1 showspartof a typical repositorydirectorystructure.

The root of the subtreein the figure is named/vesta/west.ves ta sy s.org . Thisnameis chosento begloballyuniqueacrossall Vestainstallations,forreasonsdiscussedin Section4.4below.

28

Page 41: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

/vesta/west.vestasys.org

commoncxx

private

text

tablethread

1

2

2.fast

3

4

0 1 2

thread.c

thread.h

build.ves

= Immutable = Appendable = Stub = Ghost

smith jones

doc

Figure4.1: Namingconventionexample.

Below theroot is a treeof appendabledirectoriesusedfor categorizing pack-ages.In thisexample,packagesthataregenerallyusefulareplacedin thecommondirectory, packagesthat arepart of the C++ compilationsystemarein cxx , andprivatepackagesownedby usersSmith andJonesarein private/smith andprivate/jones . This directory treecanhave arbitraryshape;the repositorytoolsplacenorestrictionson it.

At thenext level down in thetreeareindividual packagessuchastext , ta-ble , andthread . The threadpackageis shown in detail in thefigure. The im-mutablesubdirectoriesthread/2 and thread/3 areversionsof thepackage;thelatteris shown ascontainingthreeimmutablefilesandanimmutablesubdirec-tory. Versionthread/1 hasbeendeleted,leaving aghostin its placeto keepthenamefrom beingreused.Thenamethread/4 is boundto astub,reservingit foranew versionthatauseris workingon andhasnot checkedin yet.

The appendabledirectorythread/2.fast is a branch of the threadpack-age. While versions1, 2, 3, . . . representthe main line of development,thever-sionsunder2.fast areanotherline of developmentbranchingoff from version2. Essentially, thread/2.fast is a new package,whoseversion0 is identi-cal to version2 of the threadpackage.Branchesoff of branchesarealsopossi-

29

Page 42: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

ble; for example,a branchfrom thread/2.fast could be namedthread/2.fast/1.bugfix .

4.1.4 DevelopmentCycle Tools

Thedevelopmentcycleis Vesta’snamefor thesequenceof stepsthatuserstypicallygothroughwhendevelopingsoftware.Theoutlinebelow showsthecycleandgivesthenameof theVestacommandusedin eachstep.Thecommandvestais theVestaevaluator;theothercommandsshown arerepositorytools.

1. Checkoutapackage:vcheckout

2. Modify thepackage

(a) Edit: any text editor

(b) Advance:vadvance

(c) Build: vesta

(d) Test

(e) Go backto step2auntil done.

3. Checkin theresults:vcheckin.

4. Optionallygo backto step1.

Outer Loop

In theouterloopof thedevelopmentcycle,onechecksoutapackage(1), modifiesit (2), andchecksin the result as the next version(3). We call one trip aroundthis cycle a session. Checkout createsa mutableworking copy of the package.Modification is the inner loop of development,discussednext. Checkinwritesan immutablesnapshotof the working copy into the repository. Both checkoutandcheckinusecopy-on-write to improve performanceandspaceefficiency, asdescribedin Section4.3.4below.

Therepositorytoolswe have implementedusea locking paradigmfor check-out. Checkingout a packagereservesa name(that is, a versionnumber)for themodifiedversionthatis to bechecked in later. No otherusercanreserve thesamename,and normally only the userwho reserved a nameis given permissiontocheckthepackagebackin underthatname.By default,vcheckout triesto reservea versionnumberthatis onegreaterthanthehighestchecked-inversion,soit willreportaconflict if two userstry to checkout thesamepackage.

30

Page 43: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

For example,in Figure4.1,auser(say, Smith)hascheckedoutthethreadpack-agewith the commandvcheckout common/thread, therebyreservingthe name/vesta/west.ve st asy s. or g/ co mmon/ th re ad/4 for the next version.The reservation is manifestedin the repositorynamespaceasa stub,ownedandwritable only by Smith. If userJonesasksto checkout the threadpackagetoo,he will be told that version4 is alreadyreserved by Smith. Jonesis thusalertedto speakto Smithandmake suretheir intendedchangeswill not be redundantorconflicting.

If Joneswantsto proceedin parallelwith Smith,hecanaskvcheckout to re-serveadifferentversionnumber, in any of severalways.In theunusualcasewhereJones’s changesare meantto completelysubsumeSmith’s, Jonescan leapfrogSmithby askingto reserve versionthread/5 . If Jonesis just makingonevari-ant version,he can choosea nameoutsidethe main sequenceof versions,saythread/3-jones . If Jonesis startinga new line of developmentthatmaypro-ceedfor several versionsbeforemerging back into the main line, or may nevermerge back in, he can createa branchusing the vbranch tool. In our exam-ple, Jonescould type vbranch common/thread/3.jones, which would createanew packagewith thespecifiedname.Versioncommon/thread/3 .j ones /0wouldbeidenticalto common/thread/3 . Jonescouldthencheckoutthebranchandwork on it asin anormalpackage.

In noneof thesecasesdoesJonesneedto “breaka lock.” Unlike RCS,Vestadoesnot lock theold versionthatSmithstartedfrom; instead,Vestalocksthenewversionby creatingastubfor it. Sothereis nothingto interferewith Jonesstartingadifferentline of developmentthatbranchesoff from thesameold version.

Returningto the main line of the outer loop, whenSmith hasfinishedmod-ifying the package,he simply typesvcheckin. The vcheckin tool replacesthereservation stub that was createdby vcheckout with an immutablesnapshotofSmith’s mutableworking copy. It alsodeletesSmith’s working copy, so that hecannotinadvertentlycontinueto edit it afterhis sessionhasended.

Inner Loop

The inner loop of the developmentcycle is the familiar edit-build-test sequence,with the additionof onestepthat is uniqueto Vesta: advance. As statedprevi-ously, the Vestaevaluatorguaranteesbuild reproducibilityby building only fromimmutablesourcesstoredin therepository. Eventheprivatebuilds thatanindivid-ual developerdoeson work heis not readyto checkin for public usearehandledin thisway. After editingandbeforebuilding, aVestadeveloperrunsthevadvancecommand.This commandtakesan immutablesnapshotof thedeveloper’s muta-ble working copy andputsit into the repositorynamespaceundera new version

31

Page 44: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

= Immutable = Appendable = Stub = Mutable

copy

createcreate

copy

4

4

0

thread

/vesta/west.vestasys.org

common

thread

checkout3

/vesta−work

jones

Figure4.2: Action of thecommandvcheckout common/thread.

numberwithin thesession.To keepthe many snapshotscreatedduring a sessionfrom clutteringup the

sharednamespace,therepositorytoolsput themin adirectorythatis separatefromthemain-lineversionsof thepackage,calledthesessiondirectory. A sessiondi-rectoryis essentiallyabranchwith aspecialname,createdby vcheckout.

RepositoryTool Details

Figure4.2 illustratesthecompleteoperationof vcheckout. Initially, thread/3is the latestversionof the threadpackage. When userJonestypesvcheckoutcommon/thread, thesystemcreatesthereservationstubthread/4 andtheses-siondirectorythread/checkou t/ 4 in therepository’s appendablesourcetree/vesta . It createsthemutableworkingcopy jones/thread in aseparatemu-tabledirectorytreenamed/vesta-work . Thesessionis givenaninitial versionthread/checkout /4 /0 whosecontentsareimmutableandidenticalto thelastchecked-inversionin themainlineof development,thread/checkout /3 . Theworkingcopy initially hasthesamecontentsaswell, but it is fully mutable.

Jonescannow edit thefiles in his working copy with any text editoror otherfile manipulationtool. He canfreely createnew files or subdirectoriesanddeleteor renameexisting ones,usingordinarycommandslike cp, mv, rm , mkdir , and

32

Page 45: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

snapshot

1

= Immutable = Appendable = Stub = Mutable

/vesta/west.vestasys.org

common

thread

checkout3

/vesta−work

jones

4

4

0

thread

Figure 4.3: Action of the command vadvance in the directory/vesta-work/jones/thread .

rmdir .Whenever Joneswantsto try compilinghis modifiedfiles, hetypesvadvance

to save an immutablesnapshotas the next higherversionin the currentsessiondirectory. Figure4.3 illustratesthe operationof vadvance. It simply makes animmutablecopy of the working directory in the package’s sessiondirectory; thenameof this copy is thenext availableversionnumberin thesession.Jonescanof courseusevadvanceevenwhenhe is not aboutto do a build; for example,hecoulduseit to checkpointhiscurrentwork in preparationfor makingexperimentalchanges.

Jonesusesthe vestacommandto invoke the Vestaevaluator(seeChapter7)andto build the latestversionin the session.He will typically usea shortshellscriptto do boththeadvanceandevaluationin onestep.

Next, if his programhasbuilt without errors,Jonestestsit. If changesareneeded,he returnsto theeditingstepandgoesaroundthe inner loop again. Anytime Jonesneedsto reexamineanold versionor backout a change,hecansimplylook back at the old snapshot—whetherit is in the currentsession,an old ses-sion,themainline of checkins,or elsewhere—andcompareor copy thefiles to hisworkingdirectory.

Finally, whenJonesis satisfied,heendstheouterloopby runningvcheckin, as

33

Page 46: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

delete

copy

= Immutable = Appendable = Stub = Mutable

thread

/vesta/west.vestasys.org

common

thread

checkout3

/vesta−work

jones

4

4

0 1

Figure 4.4: Action of the command vcheckin in the directory/vesta-work/jones/thread .

mentionedearlier. Figure4.4 illustratesthis operation.Notethatcheckindoesnotitself snapshotJones’s working copy of thepackage;instead,it usesthesnapshotmadeby themostrecentadvance,first checkingthattheworkingcopy hasnotbeenmodifiedsincethen.

Remarks

Therepositoryserver is generalenoughto supportotherversioncontrolstyles.Forexample,only smallchangesto therepositorytoolswould be requiredto supportconcurrentversioningin the style of CVS. In concurrentversioning,thereis nolocking at all, andnew versionnumbersare chosenat checkintime ratherthancheckout time. To supportthis,wewouldaltervcheckout sothatit rememberstheversionit startedfrom, but doesnotcreateastub. Wewouldaltervcheckinsothatit testswhetherany new versionwascheckedin by anotherusersincetheworkingcopy wascheckedout, promptingtheuserto mergechangesif so. We would alsohave to changethenamingconventionfor sessiondirectoriesslightly, becausethecurrentconventionmakesthesessionnamea functionof thereservednew versionname.

We have not written any Vesta-specifictools for merging changesmadealongdifferent branches,but Vestamakes it easyto merge branchesusing commonly

34

Page 47: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

availabletools.Vestakeepstrackof theversionscreatedalongall thebranches,alltheway backto thecommonbaseversionsfrom which they diverged,andmani-festseachoneasanordinaryfile systemdirectory. Thus,onecaneasilyusestan-darddirectory-orientedtools suchas diff , diff3 , andpatch, perhapsaugmentedwith somesimpleshellscriptsfor convenience.In contrast,RCSandCVS requirespecialvariantsof thesetoolsthatareintegratedwith their versioningsystems.

Additional tools

Thevcreatetool createsanew packagecontainingnoversions.Onecanthencheckout thenew packageandaddcodeto it. Whenapackagehasnoversions,applyingvcheckout to it createsanemptydirectoryastheworkingcopy.

Thevsessionstool providesasimplegraphicalinterfacefor managingchecked-out packages.The tool automaticallydisplaysthe latestversionnumberin eachcheckout sessionbelongingto its user. It provides an Advancebutton for eachsession,whichsimply invokesvadvanceon it.

Thevlatest tool printsthelatestchecked-inversionnumberof agivenpackage,or of all thepackagesandbranchesin agivendirectorytree.

The vwhohas tool lists the user(if any) who haschecked out eithera givenpackageor all thepackagesandbranchesin agivendirectorytree.

The vhistory tool prints a changelog for a package,listing all pastversionsandtheir checkinmessages.

Ourdescriptionof therepositorytoolshastouchedonafew repositoryfeaturesthatwe havenot yetdescribedin detail. In thenext threesectionswediscussthreeof these:mutablefilesanddirectories,mutableattributes,andaccesscontrol.

4.1.5 Mutable Filesand Dir ectories

Mutablefilesanddirectoriesin therepositoryaregenerallysimilar to ordinaryfilesanddirectoriesin thehostfile system,but therepositoryprovidesadditionalfunc-tionality for efficiently copying databetweenmutableandimmutabledirectories.Thedevelopmentcycle toolsusethis functionalityto makecheckout,advance,andcheckinvery fast. The addedfunctionality is accessedthrougha separateRPCinterface,not throughNFS.

Objectsin amutabledirectorycanbecreated,deleted(without leaving ghosts),or renamedasdesired.Mutablefilescanbemodifiedfreely.

Therepositorycanquickly createamutabledirectorywhoseinitial contentsarethesameasany givenimmutabledirectory. No datacopying is necessary. We saythenew directoryis basedon theold one. It is representedinternallyasa pointer

35

Page 48: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

to thebasedirectoryplusa list of changes,initially empty. Thus,eachfile entryina mutabledirectoryinitially pointsto an immutablefile. If a usertries to modifysucha file, however, it is transparentlycopiedto preserve the immutability of theoriginal,andanentryis addedto thechangelist pointingto thecopy. A new, emptymutabledirectoryis representedin thesameway, but with anull basepointer.

The repositorycanalsoquickly createan immutabledirectoryasa snapshotof any given mutabledirectory. It doesthis by copying the basepointerandthelist of changes.In addition,eachfile in the mutabledirectorythat wasmodifiedor newly created(and henceis mutable)is simply marked as being immutable,without makinga copy. If a userlater tries to modify thefile further throughthemutabledirectory, anew mutablecopy is madeat thattime.

In Figures4.2–4.4above,mutablefiles anddirectoriesareshown asopendia-monds.As illustrated,themutabletreeunder/vesta-work is separatefrom therepository’s maintreeof appendableandimmutabledirectoriesunder/vesta .

Currently, we do not implementsymboliclinks in mutabledirectories,andwedo not allow multiple hard links to a mutablefile. Symbolic links areforbiddenbecauseit wouldnotbemeaningfulto copy asymboliclink into theimmutablepartof therepository;it is unclearwhata symboliclink thereshouldmeanif theVestaevaluatorwere to encounterone. Multiple hard links are forbiddento simplifythebookkeepingandto guaranteethata mutablefile never hasto becopiedwhenmakinganimmutablesnapshot.In practice,theselimitationsareinconsequential.

4.1.6 Mutable Attrib utes

A sourcecontrol systemtypically needsto storemetadataaboutsourcesbeyondjusttheirnamesandcontents.For example,theVestarepositorytoolsneedto knowwhethereachappendabledirectoryis acheckout session,a package,or somethingelse,andthey needto know theconnectionsbetweenstubs,sessions,andworkingdirectories.Usersoftenwant to know whenandby whoma packagewascreated,checkedout,or checkedin, andonwhatpreviousversionsanew versionwasbased.

The Vestarepositoryprovidesmutableattributesto serve thesepurposesandothers. A sourceobject’s attributesarea total function F from string namestosetsof string values. If a namen hasnever beenboundto any value, F � n� isthe empty set. Thereare operationsto set the value of F � n� to a singletonsetor to clear it to the emptyset,andoperationsto addor remove an elementfromF � n� . (SettingF � n� to a singletonis equivalentto atomicallyclearingit andthenaddingthe value.) Therearealsoseveral operationsto queryvaluesof F . Thisfunctionalityis availablethroughanRPCinterface,andtherepositorytool vattribprovidesageneralpurposecommand-lineinterfaceto it.

36

Page 49: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Attributesarenot visible to the Vestaevaluator, so it is safefor them to bemutable;changinganattributecannotchangetheresultof anevaluation.

Eachsourceobjecthasattributesunlessits parentdirectoryis immutable.Thusin Figure4.1, the immutableversioncommon/thread/ 3 hasattributes,but itsfiles andsubdirectoriesthread.c , . . . , doc do not. This limitation is imposedto simplify the implementation. It is not a problemin the useswe have for at-tributes,becausewe generallywish to treateachpackageversionasa unit, not asacollectionof fileswith individual properties.

The repositorydevelopmentcycle tools make extensive useof mutableat-tributes. For example, the vcheckout tool puts attributes on the stub, the ses-sion, and the working directory so that all three can be found given any one.The vadvance andvcheckin tools usetheseattributesand recordsomeof theirown. In addition,vcheckout andvcheckin recordthe previous version,the userrequestingtheoperation,the time, andotherinformation. Figure4.5 shows somesampleattributesappliedby the tools. Note themessage attribute on thedirec-tory vesta/repos/30 ; it is a changelog messagesolicited from the userbyvcheckin.

We alsousemutableattributesto implementa limited form of symboliclink.A symboliclink is implementedasa stubwith an attribute calledsymlink-tothatgivesthelink value. TheNFSinterfacemanifestsany stubwith this attributeasasymboliclink, but to theevaluator, suchstubsarejust stubs,sothey cannotbeusedin modelsandtheirmutabilitycannotcompromisetherepeatabilityof builds.

As aconveniencefeature,eachappendabledirectorythatcontainsversionshasasymboliclink namedlatest thatpointsto thelatestversion.For instance,in theexampleof Figure4.1, /vesta/west.ves tas ys .o rg /c ommon/t hr ead/latest wouldbeasymboliclink to 3.

Weexpectto find moreusesfor attributesin thefuture.For example,arelease-statusattribute might be usedto mark a particularversionasinternally released,externally released,or withdrawn from releasedue to bugs. The vupdate tool(Section5.3.1), which mechanicallyupdatesthe import statementsin a Vestamodelto newer versions,couldbemodifiedto take suchanattribute into account.

Attributesarealsousedto storetheaccesscontrolinformationdescribedin thenext subsection.

4.1.7 AccessControl

Designingtheaccesscontrolmodelfor therepositorypresentedsomechallenges,becausereplication(Section4.4)andotherformsof remoteaccessaresometimesneededbetweenrepositoriesthat are in different realms; that is, repositoriesthatareunderseparateadministration,havedifferentspacesof usernames,andperhaps

37

Page 50: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

% vattrib /vesta/west.vestasys.org/vesta/repos#owner

[email protected]

packagecreation-time

Thu Aug 22 17:41:35 PDT 1996created-by

[email protected]

% vattrib /vesta/west.vestasys.org/vesta/repos/ 30session-dir

/vesta/west.vestasys.org/vesta/rep os/ch eckou t/30old-version

/vesta/west.vestasys.org/vesta/rep os/29message

Added code to gather usage statistics.content

/vesta/west.vestasys.org/vesta/rep os/ch eckou t/30/1 2checkin-time

Tue Nov 4 14:10:22 PST 1997checkin-by

[email protected]#owner

[email protected]

% vattrib /vesta-work/mann/repossession-ver-arc

0session-dir

/vesta/west.vestasys.org/vesta/rep os/ch eckou t/31old-version

/vesta/west.vestasys.org/vesta/rep os/30new-version

/vesta/west.vestasys.org/vesta/rep os/31checkout-time

Wed Dec 3 09:55:12 PST 1997checkout-by

[email protected]

Figure4.5: Sampleattributesonsomedirectories.

38

Page 51: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

do not entirely trustoneanother. Thuswe needa practicalway of authenticatingand accesscheckingcross-realmrequests. In addition, for repositoriesthat arecooperatingclosely, we needit to be meaningfulto replicateaccesscontrol lists,so thateachrepositorydoesnot have to separatelyadministerthem. This sectionoutlineshow we have addressedtheseissues.

All accesscontrollistsandaccesscheckingin therepositoryaredonein termsof global principal names,with the syntaxuser@realm or ˆgroup@realm .The realm is an arbitrary namechosenby a systemadministrator, typically anInternetdomainnamethat makes the realm’s usernamesvalid email addresses.Groupnamesarewritten with a leadingcaretonly for clarity; thereis not reallyany context in which bothgroupnamesandusernamescanappear, sothecaretisnotneededto preventambiguity.

Wehavekepttherepository’s accesscontrollistscloseto theUnix stylesothatthey canbemanifestedfairly accuratelythroughtheNFSinterface.EachobjecthasanownerACL andagroupACL, plusasetof ninemodeflagsthatindicatewhethertheowner, group,andothersaregrantedread,write,and/ordirectorysearchaccess.Unlike theUnix model,theownerandgrouparesetsof globalnames,not singlelocal names;we have provided this featuremainly so thatan objectcanbegivendifferentownersin different realmsif desired.Therefore,whenchoosingwhichowner andgroup to manifestthroughthe NFS interface,the repositorysearchesfirst for onein thelocalrealm.Therepositorymapsbetweenglobalprincipalnamesandthenumericuser/groupids usedthroughtheNFS interfaceby examiningthelocal operatingsystem’s userandgroupregistries(on Unix, /etc/passwd and/etc/group ) andbuilding up a translationtable.

Accesscontrol lists and modeflags are storedin mutableattributes named#owner , #group , and#mode. Becausenot all objectshave attributes,andtosave space,we usea form of inheritance:if an object doesnot have a particu-lar accesscontrol attribute, it inheritsthe valuefrom its parentdirectory. Hence,changingtheaccesscontrolon a directorycaneffectively changetheaccesscon-trol onotherdirectoriesandfilesbelow it in thetree,adeparturefrom conventionalUnix semantics.Thenamesof all accesscontrolattributesbegin with anidentify-ing character(“#”), andthe replicator(Section4.4) canbe instructednot to copythemevenif ordinaryattributesarebeingcopied.

Executepermissionis treatedspecially. In Unix andNFS,a file’s executebitsencodetwo logically distinct items—whetherthe file is executable,andwhetherparticularprincipalshave permissionto executeit. Within a givendirectory, somefiles may be executableandsomenot, so it is not appropriatefor files to inheritexecutepermissionfrom their parentdirectories.But a file storedin animmutabledirectorydoesnot have a #mode attribute of its own, so we cannotset its per-missionsindividually. To work aroundthis problem,the repositorymaintainsan

39

Page 52: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

executableflag for every file. When the Unix chmod function is invoked on amutablefile throughtherepository’s NFSinterface,thefile’s executableflag is setto the logical or of thethreeexecutablebits given; for immutablefiles, theflag isimmutabletoo,andchmod hasno effect. We thenusetheexecutableflag andthereadpermissionbitsto determinetheexecutepermissionbits: if afile is executable,its threeexecutebitsareacopy of its readbits; if not, they areall zero.

WealsosupplytheUnix setuidandsetgidflags.(Thesetuidflag tells theoper-atingsystemthat if this file is executedasa program,theresultingprocessshouldbegrantedtheprivilegesof thefile’sowner;thesetgidflagthatit shouldbegrantedtheprivilegesof thefile’s group.)Theseflagsareof no interestto Vestaitself, buttherepositoryis carefulto maintainthemwith thepropersecurity;for example,afile’s setuidflag is automaticallyturnedoff if its owner is changed.Theflagsareencodedin aninterestingway, designedto give theright securitypropertieswhileconsideringthepossibilityof multiple andchangingownersandgroups.Both the#setuid and#setgid attributesaresetsof principalnames.If thefirst valueof#owner that is in the local realm(theowner thatwill bemanifestedthroughtheNFS interface)is alsoa valueof #setuid , the setuidflag is set,otherwisenot.Similarly, if thefirst valueof #group that is in the local realmis alsoa valueof#setgid , thesetgidbit is set,otherwisenot.

Eachincomingrequestto therepositorymustbeauthenticatedascomingfromsomeparticularuser. Several authenticationmethodsaresupported.The reposi-tory administratorfills in atable(similar in styleto anNFSexport table)thatspec-ifies which usernamesto accept,from which hosts,usingwhich authenticationmethods,andwhetherto grantnormalor read-onlyaccess.Thusfar we have im-plementedonly two ratherinsecureauthenticationmethods:theNFSAUTHUNIXstyle,wheretheuserprovideshisnumericUnix userid andis believedif hecomesfrom a trustedhost (neededto supportmostcurrentNFS clients), anda similarstylewheretheusersuppliesa globalprincipalnameandis believed if hecomesfrom a hostthat is trustedfor namesfrom thatrealm.We have a designsketchforaddingKerberosauthenticationandhopeto implementit in thefuture.

Therepositoryrecognizesthreedifferentadministrativeprincipals.Thesystemadministrator(Unix userroot ) hasblanket permissionto performany operation,exceptoperationsthat could causea violation of the replicaagreementinvariant(Section4.4). For convenience,thereis alsoa non-rootVestaadministrator(typ-ically Unix uservadmin ) that hasthe samepermissionsasroot except for per-missionto modify the #setuid and#setgid flags; for the latter, vadmin istreatedlike an ordinaryuser. Thus, a designatednon-rootusercanmanagetheVestarepository, but cannotgain root accessto othersystemresourcesor imper-sonateotherusers.Thereis alsoa specialwizarduser(typically Unix usernamevwizard ) that is permittedto do all operations,eventhosethatcouldpotentially

40

Page 53: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

violatetheagreementinvariant;this facility is providedonly for emergency repairsandis notneededin normaloperation.

Our generalmodelfor groupaccessis that theuserneedonly authenticatehisusername;therepositorydetermineswhatgroupsheis in. Thismodelworkswellfor intra-realmaccess,becausethe repositoryusesthe local operatingsystem’sfacilities to determinethe groupmembershipof local users. A useraccessingaremoterepository, however, by default will not berecognizedasa memberof anygroups,evengroupsfrom hisown realm.To addressthisproblem,weallow reposi-tory administratorsto augmentthegroupmembershiptablewith additionalentries,but this is a bit labor-intensive. A usefulfutureadditionmight be to allow groupmembershiptablesto bereplicated.

As anadditionalfeature,oneusernameor groupnamecanbedesignatedasanaliasfor another;this is usefulwhenthesamepersonhasa login in two differentrealmsor whentwo cooperatingrealmseachhave a groupthat is working on thesameproject.

We omit the detailsof what accesspermissionsarerequiredto performmostoperations,which aregenerallyobvious. Operationson attributesarecontrolledas follows. Changingan attribute whosenamebegins with # generallyrequiresownershipaccess,except for a few specialcasessuchas#owner itself (whichrequiresadministrative access),#setuid , and #setgid . Changingother at-tributesrequireswrite access,while readingattributesis unrestricted.

4.2 The Evaluator’ sView

This sectiondescribessomefeaturesof the repositorythatareprovided speciallyfor useby theVestaevaluatorandfunctioncache.Most arenot visible to ordinaryusers.

4.2.1 DerivedFilesand Shortids

As discussedin Chapter3, runningtheVestaevaluatoron amodelcreatesderivedfilessuchasobjectmodules,libraries,andexecutableprograms.Derivedfiles arenot enteredinto thesourcenamespaceasthey arecreated:by default, they haveuser-visible namesonly within the namespaceof an evaluation,that is, within arunning programin the Vestasystemdescriptionlanguage. As evaluationpro-ceeds,theevaluatorwrites functioncacheentriesthat refer to deriveds,andwhenanevaluationconcludes,theevaluatormayship(copy or link) somederivedsintopersistent,user-visible directoriesastheevaluation’s final output.

41

Page 54: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

The repositoryprovidesraw storagefor derivedsin the form of a simpleab-stractioncalledthe shortid file. A shortidfile is simply an ordinarydisk file thatis namedby a 32-bit integer, or shortid. The repositoryprovidesan interfaceforallocatinga new shortidandcreatinga correspondingfile, andfor openinganex-istingfile givenits shortid.Bothongoingevaluationsandpersistentfunctioncacheentriesreferto derivedsby shortid.

WhentheVestaweederdeletesa functioncacheentry, someor all of thede-rivedsreferencedin theentry’s resultvaluemaybecomegarbage,soshortidfilesaremanagedusingmarkandsweepgarbagecollection.As theweederwalksoverthe cache,it assemblesa completelist of shortidsthat are referencedby cacheentriesthat arebeing retained. The weederpassesthis list on to the repository,togetherwith the time at which the list is valid. (A time just beforethe startofweedingis used,to guaranteethatderivedscreatedby evaluationsrunningin par-allel with the weederarenot deleted;seeChapter8.) The repositorycandeleteany shortidfile thatis noton thelist andwhosetimeof lastchange(Unix ctime) isearlierthanthelist’s validity time.

Internally, therepositorystoressourcefiles in shortidfiles too. By definitionasourcefile is simply a shortidfile thathasa namein therepository’s sourcenamespace.As with deriveds,the repositorymanagessourcesusingmark andsweepgarbagecollection,not referencecounts. Whenever the weedersubmitsa list ofshortidsto keep,therepositoryaugmentsthelist by walking its own directorytreeto find the sourcesit needsto keep. It thendeletesonly files that arenot on theaugmentedlist.

Sourcesandderivedsneedto bemanageduniformly becauseit is possiblefora sourceto becomea derived, andvice versa. A sourcebecomesa derived if afunction returnsa sourceaspartof its result; this happensquiteoften. Later, theoriginalsource’snamemightbedeleted(thatis, reboundto aghost),but theshortidfile thatthesourcewasstoredin will bekeptaslongasacacheentrystill referstoit. A derivedbecomesasourceif anRPCclientcallstherepositoryandasksfor thederivedto beinserted(linked)into asourcedirectory. Wedonot currentlyusethisfeature,but we have somefutureapplicationsin mind. For example,theevaluatorcurrentlyimplementstheshippingof final outputderivedsfrom an evaluationbycopying theminto a conventionalfile systemdirectory, but it could insteadlinktheminto a Vestadirectory;this implementationwould befasteranduselessdiskspace.

The repositoryalso assignsa shortid to every distinct immutabledirectory.(Two immutabledirectoriesthatareidenticalexceptfor their nameandparentdi-rectoryarenot considereddistinct; they usuallyhave thesameshortid.) This fea-tureletstheevaluatorandfunctioncachereferto a wholetreeof sourceswith oneshortid,allowing a morecompact,coarse-grainedrepresentationof dependencies

42

Page 55: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

on files in the tree. The useof directoryshortidsaddsanotherstepto the repos-itory’s garbagecollection algorithm: for eachdirectory shortid that the weederasksit to keep,the repositorymustwalk the subtreerootedat the correspondingdirectoryandkeepthesubdirectoriesandfiles thereaswell.

4.2.2 Evaluator Dir ectoriesand Volatile Dir ectories

Therepositoryprovidestwo classesof directories,calledevaluatordirectoriesandvolatiledirectories, for useby theruntoolserver(Section5.2.3).Theruntoolserverencapsulatestoolsby runningthemin anenvironmentwhereall theirfile referencesarerestrictedto suchdirectories,enablingtheVestasystemto captureandrecordall thereferencesthatthetoolsmakeandto supplyappropriate,immutablecontentsfor eachreferencedfile.

An evaluatordirectoryis a directorywhosecontentsaredefinedby a bindingthat exists in the value spaceof an ongoingVestaevaluation. A binding (Sec-tion 5.2.2) is a datastructurein the Vestalanguagethat pairsnameswith values,similar to a LISP associationlist or Perlassociative array. An evaluatordirectoryD reflectsthecontentsof abinding B into thefile systemnamespace.Eachnamein the binding appearsasa namein the directory. If the nameN1 in the bindingB refersto a valueV1 of typeText (a bytestringor file), thenthenameN1 in thedirectory D refersto a file with V1 asits contents.If thenameN2 in thebindingB refersto another, subordinatebinding V2, thenthenameN2 in thedirectory Drefersto asubdirectorywhosecontentsaredefinedby V2, recursively. Thereis alsoa way to make I/O devicessuchastheUnix /dev/null appearin anevaluatordirectory. An evaluatordirectoryis immutable,becauseit is createdandusedonlywhile anevaluationis blockedinsideacall to thelanguage’s run tool primitive.

Whenthe repositoryreceivesan NFS requestto list an evaluatordirectoryorlook up a namein it, therepositorypassestherequestthroughto theevaluatorviaanRPC.WhentheevaluatorreceivessuchanRPC,it recordsadependency on thegivennameandreturnsits value.Thevaluecanbeeithera shortid,representingafile, or a handlefor anotherbinding,representinganotherevaluatordirectory. TherepositorythengeneratestheappropriateNFSreply. Therepositorykeepsa cachefor eachevaluatordirectoryto avoid repeatedRPCsfor thesamename.

In additionto looking up files, an externaltool maycreatenew files or makechangesto existing ones. Suchchangescannotbe madedirectly to the bindingbackingthecorrespondingevaluatordirectory, sincethatwould amountto a side-effect. Instead,the repositoryrecordsall suchchangesin volatile directories.Attheendof the run tool call that launchedthetool, thechangesrecordedin thevolatiledirectoryarereportedbackto thecalleraspartof the run tool result.

A volatile directoryconsistsof a pointer to an evaluatordirectory, called its

43

Page 56: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

base, anda list of changes.Thusa volatile directory is analogousto a mutabledirectory. As in a mutabledirectory, onecancreatenew files or (thanksto copy-on-write) edit existing files.1 Thesedirectoriesarecalled “volatile” not for anyfundamentalreason,but merelybecauseour repositoryimplementationdoesnotrecordthemon disk, so they are lost if the repositorycrashesandrestarts.Anyshortidfilescreatedareof courserecordedondisk; they arelatergarbagecollected,if necessary. This volatility is tolerablebecausea volatile directoryneedsto existonly aslong asit takesa singletool to run, so theonly negative effect is that onthe rareoccasionswhenthe repositorycrashes,any ongoingevaluationsfail. Incompensation,thetaskof reclaimingresourcesafteracrashis simplified.

Runninga tool thenworksasfollows. An evaluationinvokesthe run toolprimitive with theargumentsdescribedin Section5.2.3,includingthebindingthatis to supplythe initial contentsof the tool’s root directory. Theevaluatorassignsa directoryhandleto this bindingandcalls therepositoryto createa new volatiledirectorybasedon it. Theevaluatortheninvokes the runtool server, which startsthetool with therootdirectoryname“ / ” reboundto thenew volatiledirectory. (OnUnix, this stepusesthechroot systemcall.) After the tool finishesrunning,theevaluatorcallstherepositoryto find out whatchangesthetool madeto its volatiledirectory, andusesthis informationto constructaresultbinding. If thetool createdor editedany files, their new shortidsarereturnedaspart of the changelist, andthey becomederiveds.Finally, theevaluatordeletesthevolatile directory, freeingtheresourcesit wasconsumingin therepository.

4.2.3 Fingerprints

Within the functioncache,special128-bitchecksumscalledfingerprints areusedto provide compact,uniqueabbreviationsfor values.Two valuescanbecomparedfor equalityby comparingtheir fingerprints,with a vanishinglysmall probabilityof erroneouslyconsideringtwo differentvaluesto be equal. SeeSection6.2 forfurtherdetails.

As a serviceto the function cacheandevaluator, the repositorykeepsfinger-printsfor certainkindsof filesanddirectoriesandmakesthemavailablethoughanRPCinterface.

Every immutabledirectoryandimmutablefile in the treerootedat /vestahasa fingerprint,becausean evaluationcanrefer to suchfiles anddirectoriesassources.Ghosts,stubs,andappendabledirectoriesdo not have fingerprints,be-causea(successful)evaluationcannever referto one.Therepositoryalsosuppliesfingerprintsfor files in volatileandevaluatordirectories,becauseanevaluationcan

1But seeSection4.3.3for restrictions.

44

Page 57: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

produceandcachesuchafile asaderived.Volatileandevaluatordirectoriesthem-selves do not requirefingerprints;their existenceis ephemeraland the functioncachenever containsa referenceto one. For performancereasons,we useseveraldifferentmethodsof fingerprintingfilesanddirectories.

Eachlarge immutablefile andeachimmutabledirectoryis givena fingerprintbasedonthefull hierarchicalnameunderwhichit is first placedinto therepository.That fingerprintstayswith the sourceas it acquiresnew namesvia the checkin-advance-checkout cycle or via renaming.Fingerprintingbasedon the namepro-videstherequiredsemanticsbecausetherepositoryguaranteesthatanameis neverreusedfor a differentsource.

Eachlarge new derived (that is, eachlarge new file in a volatile directory)isgiven a fingerprintbasedon an arbitraryuniqueidentifier that is generatedwhenthefile is created.Thatfingerprintis reportedbackto theevaluator, storedwith anyfunctioncacheentry thatpointsto thefile, andsuppliedagainto the repositoryifthefile laterappearsasanexistingfile in anew evaluatordirectory. Fingerprintingbasedonauniqueidentifierprovidestherequiredsemanticsbecausetheidentifiersarenever reused.It givesbetterperformancethanfingerprintingthefile contentsbecausetheidentifiersaremuchshorterthantheaveragelargefile.

Thefingerprintof a small file is computedfrom its contents.This methodoffingerprintinghassomeinterestingconsequences.For example,if auser“touches”a sourcefile in a working directory(that is, performsa write to thefile thatdoesnot actuallychangethecontents),sourcefingerprintingbasedon a uniqueidenti-fier would causethefile to be recompiledandeverythingthatdependson it to berebuilt. With fingerprintingbasedoncontents,however, thefile will berecognizedasunchangedandnothingwill be rebuilt. Further, supposea usereditsthecom-mentsin a sourcefile but doesnot changethe code. This alwayscausesthe fileto berecompiled,but if thecompileris fully deterministic,therecompilationwillgenerateaderivedfile with exactly thesamecontentsasthepreviousversion.Withderivedfingerprintsbasedoncontents,thenew derivedfile wouldberecognizedasidenticalto theold one,andprogramsthatuseit wouldnotberelinked.

Becausethetwo typesof fingerprintinggivesemanticallyequivalentresults,wehavemadethesizethresholdconfigurableat runtime.By default,wecurrentlyfin-gerprintbothsourcesandderivedsthataresmallerthanonemegabyteby contents,largeronesby uniqueidentifier.2 Thisapproachworkswell for sources,but unfor-tunately, the C andC++ compilersthat we arecurrentlyusinginsert timestampsinto theobjectfiles they produce,negatingthebenefitsof fingerprintingthesefilesby contents.

2Onthehardwaredescribedin Chapter9, afile canbefingerprintedatroughly1 MB/sec(elapsedtime).

45

Page 58: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Notethatchangingthesizethresholdbetweenthefingerprintingmethodsdoesnot causea problem.Thecacheandevaluatorrequireonly that two files or direc-torieswith differentcontentsalwayshave differentfingerprints.It is not essentialthat two files or directorieswith the samecontentsalwayshave the samefinger-print, thoughit is advantageousthat they do asoften aspossible,sincethat willallow for morefunction cachehits. (We of coursearrangethat fingerprintscom-putedby thetwo methodsdo not collide,by includinga prefix in thefingerprintedstringthatsayswhich typeof fingerprintingwasused.)

A late additionto the repositorydesignwasan internal tablethat allows anyfile or directory to be looked up by its fingerprint. This tablehastwo importantuses.First, if auseradvancesasmallfile into therepositorythatis alreadypresentwith thesamecontentsbut a differentname,therepositorylooksup thenew file’sfingerprint in the table,finds that a copy is alreadypresent,andarrangesfor thetwo copiesto sharestorage,thussaving disk space.Second,the replicator(Sec-tion 4.4.8)usesthetableto avoid makingredundantcopiesof bothfiles anddirec-torieswhencopying datafrom onerepositoryto another, andto avoid copying adirectoryin full if a copy of its base(Section4.3.2)is alreadypresentat thedesti-nation.Thetableis not kepton disk,astherepositoryis ableto rebuild it from itsotherdatastructuresuponreboot.

4.3 The Implementor’ sView

This sectiondescribessomeinterestingaspectsof therepositoryimplementation.

4.3.1 Shortids and Files

The repositorystoresshortidfiles in an ordinaryfile systemprovided by the un-derlyingoperatingsystem,undera fixeddirectorychosenwhenVestais installed.Eachfile’snameis derivedfrom its 32-bit shortid.For example,afile with shortid0x12345678would have a namelike /vesta-sid/123/ 456 /7 8. Interme-diatedirectoriessuchas /vesta-sid/123 and /vesta-sid/123 /45 6 arecreatedonly whenneededandaredeletedwhenthey becomeempty. Vestausersnever seethesefilenames. The /vesta-sid directoryandall files anddirec-toriesbeneathit have their accesspermissionssetso asto be directly accessibleonly to the repositoryserver, the function cacheserver, and the evaluator. Im-mutableshortidfiles (thosecorrespondingto immutablesourcefiles or to derivedsthathave beencompletelywritten andarereferencedby cacheentries)have theirwrite permissionbits turnedoff.

Why do processesotherthanthe repositoryserver itself have direct accessto

46

Page 59: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

the /vesta-sid directory tree? We allow this for efficiency. A processthatknows a file’s shortidcanreador write it directly thoughtheunderlyingfile sys-tem, therebyavoiding the overheadof passingdatathroughthe repositoryNFSinterfaceandoffloadingwork from therepositoryserver. Wealsopermitprocessesto createshortidfiles without contactingthe repositoryfor eachone. Thereposi-tory providesan RPCinterfacethatallocatesnew shortidsin blocksof 256. Theblockshave leases;that is, ownershipof a block timesout if it is not periodicallyrenewed,enablingtherepositoryto reclaimblocksallocatedto processesthathavecrashed.A layerof library codeprovidedby therepositoryimplementationhidesthecomplexity of block allocationandleaserenewal from clientprograms.

This efficient route for accessingshortid files unfortunatelyseeslittle use.Nearlyall accessesto shortidfilesarethroughtherepositoryNFSinterface.Usersof courseaccesssourcesthroughNFS.Encapsulatedtoolsreadexistingsourcesandderiveds,andwrite new deriveds,usingthe repository’s NFS interfaceto volatiledirectories.Only a few accessesby theevaluatoritself andby theweederbypassthe repositoryNFSserver. In hindsightit might have beena betterdesignchoiceto omit thisaccesspath,insteadmakingthe /vesta-sid directorytreeaprivatedatastructurethatis hiddenfrom all processesbut therepositoryserver itself.

To conserve memory, the repositoryavoids keepingany sort of in-memorystructureindexed by file shortid. It keepsa recordonly of which 256-shortidblocks are currently leased. When someprocessholds a leaseon a block, thatprocessmaintainsa bitmapof unusedshortidsin the block. The initial valueforthis bitmapis computedwhentheblock is allocated,by looking at the /vesta-sid directorytreeandseeingwhich shortidsin the block arecurrentlyboundtofiles. Whenallocatingnew blocks,the repositorytries to chooseemptyones,butdoesnotguaranteeto dosoevery time.

Therepositorydoesneedto keepanindex thatmapsfrom immutabledirectoryshortidsto theactualdirectorystructurein memory. Theindex is storedasa hashtablein memory. Theindex is little used,sothereshouldbenoseriousperformanceproblemif it falls out of the repository’s working setandneedsto bepagedbackin from disk whenused.It might have beenpreferableto eliminatethis index bycomputinga directory’s shortidfrom its memoryaddress,but this wasimpracticalbecausetherepository’s memorymanagement(discussednext) sometimesmovesdirectoriesto differentaddresses.At any rate, the index doesnot take up muchspace;seeSection9.3.2.

4.3.2 Dir ectories

Therepositorykeepsall of its directorystructurein virtual memory. This includesall five directorytypesdescribedabove (appendable,immutable,mutable,evalua-

47

Page 60: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

tor, andvolatile),plusstubsandghosts.File datais storedin shortidfiles;directoryentriesfor filespoint to themby shortid.

While the directory structureexternally appearsto be a tree, internally it isa DAG (directedacyclic graph). Much structurethat appearsto be repeatedinthe tree occursonly oncein the DAG, therebysaving a greatdeal of memory.Internally, directoriesdo not have parentlinks. Therefore,when an immutabledirectory with the samechildren appearsat two or more different placesin thetree,internallythereneedbeonly onecopy. This sortof sharingoccurswhenevera packagehasa subdirectorythat remainsunchangedfrom onepackageversionto the next. Also, assketchedearlier, every directoryis implementedasa list ofentriestogetherwith a (possiblynull) basepointer. Therefore,whenonedirectoryis nearlythesameasanother, thesecondcanberepresentedcompactlyasa list ofchangesrelative to thefirst. This sortof sharingoccursbetweeneachversionof apackageandthepreviousversion.Currently, thereis no limit on how long a chainof basepointerscangrow, but it would be straightforward to adda heuristicthatbreaksthechainif it becomessolong thatlookupsaretooslow.

Therepositorypacksits in-memorystructuretightly to keepmemoryconsump-tion down. Thegoalis to keepthestructuresmallenoughto stayresidentin phys-ical memoryatall times,sothatdirectoryaccesswill notbeslowedby paging.Tothis end,referencesbetweenpartsof thestructureuse32-bit arrayindicesinsteadof pointers(which are64 bits wide on Alpha, thearchitectureon which Vestawasfirst implemented),andrecordfieldsarearbitrarily byte-aligned.We demonstratetheeffectivenessof this compactionin Section9.3.2.

To keepthedirectorydatastable,we usea simpleloggingandcheckpointingtechnique[7]. Wheneverarepositoryclientrequestsanupdateoperation,theserverappendsa recordof the operation’s nameandparametersto a log file andforcesit to disk beforemodifying the in-memorydataandreturningto theclient. If therepositorycrashes,uponrestartit replaysthelog to restoreits state.Operationsonvolatile directoriesarenot logged.

To make recovery faster, the repositoryoccasionallymakes a checkpoint bydumpingits stateto a file. Thenext recovery thenbegins from themostrecentlycommittedcheckpoint. The checkpointcode is actually an application-specificcompactinggarbagecollector, socheckpointinghastheusefulside-effectof reduc-ing memoryfragmentation.Thealgorithmis designedfor minimalmemoryusage.First, it writes the compactedcheckpointdirectly to a file, not to a second(“to-space”)memoryregion asanordinarycopying garbagecollectorwould. Second,whenit copiesanobject,it putstheforwardingpointerto theobject’s new address(which is neededto keeptheobjectfrom beingcopiedmorethanonceif therearemany pointersto it) into thefirst few bytesof theobject’s old location. Thusthealgorithmis destructive, necessitatinganimmediaterecovery from thecheckpoint

48

Page 61: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

file whenit is complete.3 This recovery is transparentto therestof thesystem;wemake it so by including even volatile directorystructuresin thecheckpoint.Thevolatile structuresarewritten at the endof the checkpointandareignoredwhenrecoveringfrom arealcrash.

Our loggingpackageis sufficiently general-purposethatwe useit in thefunc-tion cacheserver aswell astherepository. It permitsanarbitrarynumberof bytesto be atomically appendedto a log, even if the underlyingfile systemand diskcontrollerhardwarereorderwritesto disk,by includingaversionnumberin everydiskblock. Thelog packsdatatightly into diskblocks,yet ensuresthatcommitteddatais not lost even if a hardwarefailurecorruptstheblock currentlybeingwrit-ten. It achievesthis by alternatelywriting two differentblockswhenaddingmoredataat thetail of thelog, like theping-pongalgorithm[21] but avoiding theneedto sometimesdo the lastwrite twice. Thepackagesupportsfuzzy checkpointing;that is, makinga checkpointin parallelwith appendingmoredata.Therepositoryserver doesnot requirethis feature,but thecacheserver usesit to enableweedingto proceedin parallelwith normaloperation.

4.3.3 Longids

To operateasanNFSserver, therepositorymustassigna 32-byteNFSfile handleto everyfile anddirectoryit stores,andit mustbeableto look upafile or directoryby its handle. For properNFS semantics,the meaningof a handlemust remainstableacrossrepositorycrashesandrestarts,andahandlefor adeletedobjectmustnot bereused(at leastnot soon).To keepfrom usingtoo muchmemory, we wantto avoid having a largetablethatmapsfrom handlesto memoryaddresses,yet wecannotusememoryaddressesdirectly as handlesbecausecheckpointingmovesdirectoriesto differentaddresses.

Anotherproblemtherepositoryfacesis how to implementtheparentlinks in aUnix-like directorytreewhentheinternalDAG representationdoesnot storesuchlinks. Indeed,asmentionedearlier, several externally visible andapparentlydis-tinct directorieswith differentparentsmaysharethesameinternalrepresentation.

Wesolveboththeseproblemswith onenovel mechanism,thelongid. A longidencodesthepaththroughtheexternallyvisibledirectorytreethatwasusedto reachanobject. Eachcomponentof thepathnameis encodedasan index number. Thelow-orderbit of this index numberindicateswhetherthe entry is in the mutablechangelist of thedirectoryor in theimmutablebase,while theotherbits give theentry numberwithin the indicatedlist. The index numbersare representedin avariable-widthencoding(wherenumbers0 to 27 � 1 requireonebyte,27 to 214 � 1

3If the repositorycrasheswhile writing a checkpoint,it will recover from the mostrecentsuc-cessfulcheckpointandthesucceedingoperationlog.

49

Page 62: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

require two, etc.), and the resultingbytesare concatenatedto form the longid.We reserve 0 asa terminator. Longidsarerestrictedto 32 bytesin lengthso thatthey canbeuseddirectly asNFShandles.Restrictingthe longid lengthof courserestrictsthe depthof the directory tree, but 32 bytesseemsamplefor any sizesourcepool we canimaginestoring.4 Longidsarerepository-specific,not global;replicationdoesnot transmitthemfrom onerepositoryto another.

Longids have the major propertieswe need. Every object in the externallyvisible directory treehasits own longid, which remainsstableacrossrepositoryrestarts.Longidsarenotreused,becausedirectoryentriesarenotreusedor deleted.(In directorytypesthatpermitdeletion,we keepaninvisible placeholderfor eachdeletedentryto preventits index numberfrom beingreused.)Therepositorylooksup a longid by traversingthedirectorytreemuchasif it werelooking up thecor-respondingname;this traversalis fastbecausethe entiretreeis kept in memory.Givenanobject’s longid,onecanfind its parent’s longid simply by truncatingthelastcomponent.

Longidsdo not quite provide a perfectimplementationof the expectedNFSsemantics,but we have beenableto paperover thedifferenceeffectively. In par-ticular, if anobjectis renamed,its NFShandleis expectedto remainthesame.Inthe repository, an object that is renamedunavoidably getsa new longid, but wereplaceits old directoryentrywith a forwardingpointerto thenew onesothat itsold longid cancontinueto work also.Theold longid stopsworking if theobject’sold parentdirectoryis deleted,however. Also, theexistenceof two handlesfor thesamemutablefile will causean NFS cachecoherenceproblemin the extremelyunlikely eventthatthesameclient hasthefile opentwice,onceundertheold han-dle from beforeit was renamedandonceunderthe new nameandnew handle.Eventhoughbothopensaredoneby thesameclient andthuswould normallybecoherent,in this casetheclient seestwo differenthandles,so it will cachethefiletwice andfail to keepthetwo copiescoherent.We have never seensucha caseinpractice.

Longids as describedso far have one major drawback. When the evaluatorcreatesa new volatile directory tree to provide an encapsulatedenvironmentfora tool, all the files and directoriesin the treeacquirenew, uniquelongids. Butwhentheevaluatorrunsseveral tools in successionin thecourseof a build, manyshortidfiles areaccessedrepeatedly;for example,standardC headerfiles thatarereadby many compiles,or objectfiles that arewritten by a compileandreadbya subsequentlink. For goodperformance,tools shouldgetNFS client cachehitswhenthey accessfilesthatothertoolshaverecentlyaccessed,but thisis impossiblewhenthesamefile is seenashaving a differentfile handleeachtime a new tool is

4In ourown usagethusfar, we have notexceeded13bytes.

50

Page 63: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

run. Wehadsubstantialperformanceproblemsin largeVestabuildsuntil wefoundasolutionto thisproblem.

Our solutionis asfollows. Files in volatile directoriesareoptionally given ashortid-basedlongid, anew kindof longidthatencodesthefile’sshortidandfinger-print insteadof its pathname.Thus,thesameshortidfile hasthesamefile handleevery time any tool encountersit, andthetoolsseegoodNFScacheperformance.All directoriescontinueto usestandardpathname-basedlongids.

Shortid-basedlongids have two drawbacksthat keepthem from universallydisplacingpathname-basedlongids.

First, a level of indirectionis lost, makingcopy-on-write impossible.That is,with apathname-basedlongid,therearetwo levelsof indirection:thelongidspeci-fiesapaththroughthedirectorytree,andthetreespecifiesashortid.Thereforewecanchangetheshortidthata longidpointsto by changingthetree;in particular, wecando copy-on write by changingthe treefrom specifyingan immutableshortidfile to specifyinga new, mutablecopy of the samefile with a different shortid.But a shortid-basedlongid specifiestheshortiddirectly, so its meaningcannotbechangedthisway. Thismakesshortid-basedlongidsunsuitablefor files in mutabledirectories,sowedonotusethemthere.It alsomeansthattoolscannotbeallowedto modify existingfilesin volatiledirectories,astheoriginaldesignpermitted(Sec-tion 4.2.2),becausethatalsorequirescopy-on-write.This limitation is noproblemfor mosttools, but to accommodateunusualtools that needto be ableto modifyexistingfiles,wehave madetheold behavior selectableby aflag to theevaluator’srun tool primitive (which is in turnpassedto therepository’s volatiledirectory

creationprimitive).Second,theability to find asourceobject’s parentdirectoryis lost. Thismakes

shortid-basedlongidsunsuitablefor directories.It alsomeansthata file’s accesscontrolcannotbeinheritedfrom its parentdirectory, whichmakesthenew longidspoorlysuitedfor filesin immutableandappendabledirectories.Onecouldimagineliving with the accesscontrol limitation, by making all immutablefiles world-readableandrelying on directoryaccesscontrolsto protectthemwhennecessary,but thisoptionis unattractive.

Fortunately, pathname-basedlongidsprovideadequateNFScacheperformancein mutable,immutable,andappendabledirectories,becausesuchdirectoriesarenot createdanddeletedfrequently, soretainingthemtherehascausedusno prob-lems.

4.3.4 Copy on Write

As explainedearlier, therepositoryusescopy-on-writeto save diskspace.Whenausercreatesa mutabledirectorybasedon an immutableone,all thefile entriesin

51

Page 64: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

thenew directoryinitially referto immutablefiles. If theusertriesto write to sucha file, therepositorycopiesit to a new file with anew shortid,addsanentryto the“changes”portionof themutabledirectoryto point to it, andwritesto thenew fileinstead.

Pathname-basedlongidsaddan extra complicationhere. In the situationde-scribedin thepreviousparagraph,thenew entryhasadifferentindex numberthantheold one,soalthoughthenew copy of thefile hasthesamenameastheold one,it hasa differentlongid! Wefix this problemby settinga flag in thenew directoryentry to indicatethat theold longid shouldcontinueto beused,not thenew one.Whentherepositorylooksupanameto find alongid,if it encountersanentrywiththis flag set,it looksfor anotherentrycontainingthesamenamein thedirectory’simmutablebaseandusesthatentry’s index numberin thelongid. Whentherepos-itory looks up an objectby longid, eachtime it encountersan index numberthatpointsinto theimmutablebaseof amutabledirectory, it extractsthenamefrom thedirectoryentryfoundandchecksto seeif thereis aflaggedentryin the“changes”portionwith thesamename.Thissolutionusesminimalspacebut doesslow downnameandlongid lookupabit.

We alsodo copy-on-write for directories. A new mutabledirectorymay bebasedon an immutabledirectorywith immutablesubdirectories.If a usertries toedit a file (or make any otherchange)in sucha subdirectory, therepositorycopiesthe old immutablesubdirectoryto a new mutableone,andaddsan entry to themutableparentdirectoryto point to it. In this case,of course,the copying itselfis optimizedby creatingthecopy asa new directorybasedon theold onewith aninitially emptylist of changes.If theusermakeshisfirst edit severallevelsdeepinthedirectorystructure,thecopying processis appliedrecursively.

4.3.5 NFS Interface

TherepositoryNFSserver runsentirelyin userspace.It is simplyasoftwarelayerontopof thebasicrepositoryfunctionality, whichis (aswehavedescribed)layeredontopof anordinaryfile system.Thelayered,user-spaceapproachmakesfor sim-pler implementationanddebuggingthana kernel-residentapproach,but it incursadditionaloverheadin datacopying and context switching. We show in Chap-ter 9 that althoughthe repositoryprovidespoorerfile systemperformancethanastandardkernel-residentNFSserver, it is still fully adequatefor ourpurposes.

TheNFSserver implementationusesa modifiedversionof Sun’s ONC (OpenNetwork Computing)RPClibrary. Theoriginal library wasdesignedfor useonlyin single-threadedprograms;in particular, its server-sideduplicatesuppressionma-chineryassumestherecanbeonly oneoutstandingrequestat a time. But becauseNFSis built onasimplerequest/responseprotocolwith nodatastreaming,NFSim-

52

Page 65: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

plementationsgenerallyperformbadly unlessmany NFS readsand/orwritescanbein flight simultaneouslybetweenthesameclient andserver in separatethreads.Werewrotetheduplicatesuppressionmachineryto enablemultithreading,therebyremoving thisperformancebottleneck.

Therepositorycheatsslightly in its implementationof NFSversion2 seman-tics. For strict correctness,when a client writes to an NFS2 server, the servermustmake surethedatais stable(eitheron disk or in nonvolatile memory)beforeacknowledging the write to the client. Otherwise,if the server shouldcrashandrestartwith somedataacknowledgedbut not actuallyon disk, the client’s cachewouldbecomeincoherentwith theserver’s stateandthedatawouldnever bewrit-ten. Therepositorydoesnot implementthesesemantics;it passeswriteson to theunderlyingoperatingsystembeforeacknowledgingthemto theclient, but it doesnot wait for theoperatingsystemto force themout to disk. Thereforeif thema-chinethattherepositoryserver is runningon(not just therepositoryprocessitself)crasheswhile a client is actively writing to it, a write that the client believeshasbeendonemayactuallybelost.

We have chosento leave this problemuncorrectedbecauseit rarelyoccursinpractice,typically haslittle impact,andwould be quite difficult to fix. The bestlong-termfix would be for us to upgradetherepository’s NFSimplementationtoNFS version3, but this protocol is muchmorecomplex thanNFS version2. Asimplefix within theNFS2framework would be to make a Unix fsync systemcall to forceeachwrite to disk beforeacknowledgingit backto theclient, but thischangewouldsignificantlyharmtherepository’s NFSperformance.

4.3.6 RPC Interfaces

Therepositoryhastwo additionalRPCinterfaces,onefor accessto shortidfilesandonefor accessto thedirectorystructure.Most of theinterestingfeaturesavailablethroughtheseinterfaceshave beendiscussedabove, so we do not describethemfurtherhere.BothinterfacesusetheSRPC(simpleRPC)packagedescribedbrieflyin Section9.6.

4.4 Replication

Increasingly, large software systemsare developedin parallel at geographicallydistributedsites. The Vestarepositorywasthereforedesignedto make it easytoreplicatesourcesat many sites. To enabledevelopersto work independently, thereplicationdesignallows eachrepositoryto operatemostlyautonomously, therebyreducingtheoverheadof normalrepositoryoperations.

53

Page 66: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Our conceptof replication is broad, encompassingeverything from sourcedistributions issuedon CD-ROM to cooperative developmentof onesourcepoolacrossmany geographicallyseparatedsites. We begin by presentingthe overallconceptandexplaininghow it hasaffectedthedesignof sourcenaming,mutableattributes,andaccesscontrol. We thengo on to describethe replicationtoolswehave written.

4.4.1 Global Namespace

Conceptually, Vestasourcesarenamedin a singlenamespacethatis globalacrossall Vestarepositories.As describedearlier, thenamespaceis organizedasa tree.Eachrepositorystoresa subtreeof the total namespace.Replicationexists whentwo or morerepositoriesstoresubtreesthatoverlap. In this case,we requirethattheoverlappingportionsagree; thatis, (1) thesamenameis notboundto differentvaluesin two different repositories,and(2) at mostonerepositoryis masterforeachname.Wedefinemastershipandagreementpreciselyin thenext section.Weoften usethe term partial replication for our style of replication,becauseeachrepositorycanreplicateall, part,or noneof thedatastoredin any otherrepository.

Wehave chosenthestandardVestanamingconventionto make it easyfor newsitesto adoptVestawithout inadvertently creatingsourcenamesthat clashwiththoseat othersites. Thus,two sitesthat initially know nothingof eachotherandshareno sourcescanstill beconsideredparticipantsin theglobalnamespace,andcanlaterdecideto cooperateandtakereplicasof eachother’s code.In thestandardnamingconvention,therootof theglobalnamespaceis called/vesta . Wechosethis namesimply to make it easyto mounttherepositoryinto a standardUnix filenamespace.Whena new Vestasite wantsto createsourcesthat are initially notsharedwith any othersite,by conventionthesiteadministratorputsthemin anewdirectoryimmediatelybelow /vesta , namedwith anInternetdomainnamethatthe site owns. In this way, the new namesaremadeglobally uniquewithout theneedfor any specialcoordination.Thismechanismis notperfect,becauseInternetdomainnamescanbederegisteredandlater reregisteredto someotherowner, butwe believe it is adequatefor practicaluse.

Sourcesthatareto bedistributedwidely shouldbenamedcarefully, sothatthenamesmake senseto the peoplewho will be usingthem. For example,we planto make the Vestasourcespublicly availableundera directorynamed/vesta/vestasys.org , notonenamedfor someparticularmachine.

54

Page 67: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

4.4.2 Mastership

Everysourceobjectin aVestarepositoryhasabooleanmasterflag,includingfiles,directories,stubs,andghosts.Whenanobjectis replicatedin severalrepositories,its masterflag is set in at most one of them. Mastershipfor an object can betransferredfrom one repositoryto another;this must be doneusing a carefullydesignedprotocol, to ensurethat a failed transferattemptstill leaves the objectwith exactlyonemaster. Wedescribemastershiptransferin Section4.4.5below.

Mastershipis importantchiefly for appendabledirectories. The mastercopyof an appendabledirectoryis thesynchronizationpoint for addingnew namestothe directory. Arbitrary new namescanbe freely addedto a masterappendabledirectory, but new namescanbeaddedto anonmasterappendabledirectoryonly bycopying themfrom anotherrepository. Whenanappendabledirectoryis masteredataparticularrepository, wedonotrequirethatrepositoryto storeacompletecopyof the subtreerootedat that directory. Nevertheless,the masterrepositoryneedsto keepa completelist of boundnamesto prevent clasheswhennew namesareinserted.To supportthis,weaddanotherusefor thestubsandghostsintroducedinSection4.1.1above: themastercopy maybind somenamesto placeholderstubsor ghostswhile othercopiesstoretherealdata.Notethatan immutabledirectorycannotcontaina stubor ghost. Thusat eachrepository, every treerootedat animmutabledirectoryis eithercompletelypresentor completelyabsent.In termsofpackagesandversions,if any file or directoryfrom apackageversionis replicatedin agivenrepository, thenthatentireversionmustbereplicatedthere.

Mastershipis importantfor stubsaswell. A masterstubcanbefreely replacedwith a newly createdsourceof any type,while a nonmasterstubcanbe replacedonly by copying from anotherrepository. In eithercase,the new sourcehasthesamemastershipstatusasthe old stub. The reservation stubsintroducedin Sec-tion 4.1.4abovearemasterstubs.Thus,amasterstubis aplaceholderfor datathathasyet to be created,while a nonmasterstub is a placeholderfor datathat mayexist in anotherrepositorybut is not replicatedlocally.

We alsomake a distinctionbetweenmasterandnonmasterghosts.Both typesof ghostsindicatethat a previously existing sourcehasbeendeleted;thus theyserve a placeholderrole similar to that of stubs. We allow either type of ghostto be replacedby a copy of the sourcetaken from anotherreplica,with the newsourceretainingthemastershipstatusof theold ghost,exceptthata masterghostcannotbechangedto amasterappendabledirectoryor masterstub. It is impermis-sibleto changeamasterghostto amasterappendabledirectorybecausewecannotguaranteeto restoreall the namesthat wereboundin the directoryat the time itwasdeleted.It is impermissibleto changeamasterghostto a masterstubbecausethemasterstubcould in turn bereplacedby anarbitraryobjectdifferentfrom the

55

Page 68: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

name’s original,pre-ghostvalue,thusviolatingVesta’s immutability guarantee.5

For othertypesof objects,therule thatanamehasatmostonemasterremains,but mastershiphasnootherenforcedmeaning.By convention,however, themastercopy of anobjectcanbethoughtof asthe“main” copy, whichshouldnotbedeleted(or replacedwith aghost)without thinking twice.

4.4.3 Replication Example

Figure4.6shows anexampleof two repositoriesthatpartially replicateeachotherandarein agreement(consistent).Thefigure illustratesseveral commonpatternsthatoccurin realVestausage.On theleft is thewestcoastrepositoryof theimagi-naryVestaSystemsOrganization;on theright is its eastcoastrepository.

= Immutable = Master Appendable = Nonmaster Appendable = Master Stub = Nonmaster Stub

/vesta

west.vestasys.org

common

textthread

cache

1 2

table

12

3

/vesta

west.vestasys.org east.vestasys.org

common

text

cachethread

2

gui

12 1 3

Figure4.6: Two repositories(westernandeastern)that are in agree-ment.

As discussedabove,therootdirectory/vesta is notmasteredateitherrepos-itory, andthenamesdirectlyunderit look like Internetdomainnames.Thewesternrepositoryhascreatedasubdirectorynamedwest.vestasys.o rg andholdsitsmastercopy, andtheeasternrepositoryhasdonethesamefor east.vestasys.org .

Partial replicationoccursat several levelsof thetree.At thetop,partof west.vestasys.org/c ommon is replicatedin theeasternrepository. Thewesterncopy hasa completelist of names(thread , text , table , andcache ), whiletheeasterncopy is lacking table . Thewesterncopy doesnot have thecontents

5In hindsight,we could have simplified the designby eliminatingboth masterandnonmasterghosts,usingnonmasterstubsfor deleteditemsinstead.

56

Page 69: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

of thecache directory, but doeshaveastubwith thatnameasaplaceholder;thusno onecancreatea differentcache directorythat would clashwith the copy intheeasternrepository.

Onelevel down, in the thread package,thewesterncopy is masterandhasall threeversionsthat currentlyexist, while the easterncopy doesnot currentlyhave version3. The text packageillustratesthe point thata directoryneednothave thesamemasterasits parent;it is masteredat theeasternrepository. Perhapswhenfirst createdit wasmasteredat thewesternrepositoryandlatermovedto theeasternrepository, sinceversion1 is presentin thewestbut not in theeast.Sincetheeasterncopy is master, it musthave a completelist of names,so it hasa stubfor version1, perhapsinsertedat thetime it received mastership.In addition,theeasterncopy hasa masterstub for version3. This masterstub is a placeholderfor an object whosecontenthasnot yet beensupplied;the masterrepositoryisfree to replaceit later with a different type of object,but thereafterit cannotbechangedbackto a masterstub. Masterstubsareusedby vcheckout to representreservations,asdescribedin Section4.1.4above.

4.4.4 Agreement

Wearenow readyto givetheprecisedefinitionof Vestaagreement. Let A andB beVestasourceobjects,let A �master denotethemasterflagof A, let A � r epos denotethe repositorywhereA is stored,andif A is a directory, let A � names denotethelist of namesthat areboundin it. Then A � B (read“ A agreeswith B”) if thefollowing recursively definedconditionshold:

1. A �master � B �master � A � r epos � B � r epos and

2. At leastoneof thefollowing holds:

(a) A andB arefileswith identicalcontents.

(b) A andB areimmutabledirectorieswhere

i. A � names � B � names, and

ii. n : n A � names � A� n � B � n,

(c) A andB areappendabledirectorieswhere

i. n : n A � names � n B � names � A� n � B � n,

ii. A �master � B � names � A � names, and

iii. B �master � A � names � B � names.

(d) A andB arebothmasterstubs.

(e) A or B is anonmasterstub.

57

Page 70: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

(f) A or B is aghost.

In addition,wesaytwo repositoriesagreewhentheir replicasof therootdirec-tory /vesta agree.

Condition1 effectively saysthatthesamesourceis not masteredin morethanonerepository. It is statedin an odd-soundingway so that agreementcanbe re-flexive (thatis, sothata repositoryagreeswith itself). Condition2d is alsoneededonly for reflexivity. Conditions2aand2b requirereplicasof immutablefiles anddirectoriesto beidentical.

Conditions2cand2emakepartialreplicationpossible.By 2c,two appendabledirectoriescanagreeevenif oneor bothhave only a subsetof thecompletesetofnamesdefinedin thedirectory. But themasterreplicahasacompletelist of names;thus,themastercancoordinatethecreationof new names,assuringthat thesamenameis never boundin differentreplicasto sourcesthatdo not agree.By 2e twoappendabledirectoriescanagreeevenif onehasa nonmasterstubwheretheotherhassomeotherobject.

Conditions2d–2f reflecttheway stubsandghostsareintendedto beused,asdescribedabove. A masterstubagreeswith nothingbut itself or a nonmasterstub,becauseamasterstubrepresentsasourcethatis to becheckedin later. If themasterstubis still present,theactualsourcehasnotyetbeencheckedin, soit cannotexistin adifferentrepository. A nonmasterstub,however, agreeswith anything. A ghostalsoagreeswith anything,becauseanobjectcanbedeletedfrom onerepositorybutremainpresentin others.

Notice that the agreementrelation is not transitive; pairwiseagreementbe-tween A and B and betweenB and C is not sufficient to guaranteeagreementbetweenA andC. This nontransitivity is anunavoidablepropertyof partial repli-cation. Replicasareconsideredto agreewhentheir overlappingportionsdo notclash;but A andC mayoverlapandclashin a portion thatdoesnot overlapwithB. For example,supposethat /vesta/foo is masteredat repositoryA, andthat/vesta/foo/bar is an immutabledirectoryin repositoryA, absentin reposi-tory B, andan immutablefile in repositoryC. Then A � B and B � C, but thedirectoryat A clasheswith thefile atC, so A doesnotagreewith C.

Lestthisexamplecreateawrongimpression,wemuststressthatVestareplica-tion is not intendedto operatein amodewheresomepairsof repositoriesagreeandothersdo not. We want to establishandmaintainglobal agreement: the invariantthatall pairs of repositoriesagree.

Wepreserveglobalagreementby allowing arepositoryto changeonly throughtheapplicationof specificoperationsthatareknown to besafe. For anoperationthatmodifiesonly onerepositoryA, it is sufficient to show thatfor all repositoriesB, if initially A � B holds,thenit still holdsaftertheoperation.For anoperation

58

Page 71: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

thatmodifiestwo repositoriesA and B, it is sufficient to show that for all repos-itories C, if initially A � B � B � C � C � A holds, thenit still holdsaftertheoperation.Dueto nontransitivity, all threetermsin thepostconditionmustbeproved,andall threein thepreconditionmustbegiven.

4.4.5 Agreement-Preserving Primiti ves

It is easyto establishinitial agreementamongrepositories,sincea new repositorythatcontainsonly anemptycopy of theroot directory/vesta agreeswith everyotherrepository. Thereafter, we needto know how to make changesto agreeingrepositoriesin a safeway, onethat is guaranteednot to breaktheagreement.Wewantrepositoriesto operatemostlyautonomously, soit is importantthatmostop-erationscanbeperformedwithout consultinganotherrepository, andthat therestrequireconsultingonly oneother. Ourdefinitionof agreementis designedto makethis fairly easy. The following operationsaresafe,andthe repositoryserver pro-videseachoneasa primitive.6 All theprimitivesareatomicexceptfor number7,mastershiptransfer.

1. Createanew masterappendabledirectoryin /vesta , usingalocally ownedInternetdomainname.

2. Createa new child objectof any immutableor appendabletypein a masterappendabledirectory.

3. Replaceamasterstubwith anew immutableobject.

4. Replaceany child of anappendabledirectorywith aghostthathasthesamemastershipstatusastheold child.7

5. Copy any child into an appendabledirectoryfrom anotherrepository, pos-sibly replacingan existing ghostor nonmasterstub. If the original is anappendabledirectory, thecopy is anemptynonmasterappendabledirectory;if desired,its childrencanbecopiedby furtherapplicationsof theprimitive.If the original is immutable,however, it is copiedin full, including all itsdescendants.

6. Createanonmasterstubin anappendabledirectory, if anotherrepositoryhasthatnamedefined.

6Wedonotpresentaproofof safety, but theoperationsaregenerallysimpleandit shouldbeeasyfor thereaderto convincehimselfthateachoneis safe.

7Several variantsof this primitive arealsosafe,but we currentlyusethis one. Safealternativesincludeusinganonmasterstubin placeof aghost,and/orcompletelyremoving thechild if theparentis notmaster.

59

Page 72: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

7. Transfermastershiponanobjectfrom onerepositoryto another, at thesametime addingstubsto the new masterfor any missingchildren of the oldmaster.

Internally, therepositorybuilds themorecomplex primitiveson topof a setofsimplerprimitivesfor addingandreplacingsingleobjects,usinga built-in featurethatallows for shortatomictransactionswithin a singlerepository. Primitives1–4runata singlerepository.

Primitives5 and6 requireconsultinganotherrepository, but amulti-siteatomictransactionis not required;it is sufficient to readthedatafrom thesourcerepos-itory, thenatomically insert a copy into the destination. No lock on the sourcerepositoryis neededwhile readingtheoriginal,sinceit cannotchange;at worst,itcanbereplacedwith astubwhile thereadis in progress,but thissimplycausestheprimitive to returnanerrorwithoutchangingthedestination.

As mentionedat theendof Section4.2.3,thedestinationrepositoryoptimizesthe copying processin primitive 5 to avoid makingredundantcopiesof objects,suchas multiple objectsthat have the samecontentbut different names. Eachrepositorykeepsan in-memorytablein which eachlocally storedimmutablefileandimmutabledirectorytreecanbelooked up by its fingerprint. Whenanobjectis to becopied,thedestinationrepositoryfirst looksupits fingerprintin thetabletofind whethera copy is alreadypresent;if so,therepositorylinks theexisting copyinto its namespaceinsteadof makinganother. In addition, if a directorybeingcopiedis encodedin the sourcerepositoryasa list of changesrelative to a basedirectory(Section4.3.2),andthedestinationrepositoryalreadyhasa copy of thebasedirectory, thenthedestinationencodesthecopy in thesameway.

Primitive7, mastershiptransfer, is themostcomplex. Ourgoalsin choosinganimplementationwereto guaranteethat agreementcould not be violated,to avoidblockingeitherrepositoryduringthetransferprotocol,to minimizethelikelihoodof a failureresultingin neitherrepositorybeingmaster, andto keepahint on eachnonmasterobjectasto whereits masterrepositoryis located.

In barestoutline, our implementationconsistsof two separateatomicopera-tions.First, therepositorycedingmastershiponanobjectmakesacompletelist ofits childrenandturnsoff its masterflag; second,the repositoryacquiringmaster-shipinsertsany missingchildreninto its copy asnonmasterstubsandturnson themasterflag. To meetour goals,the full implementationalsotakescareof updat-ing themasterlocationhints,andit keepsa stablerecordof in-progresstransfersat both repositories,persistentlyretrying themuntil they arecomplete.With thisimplementation,agreementcannotbeviolated,andtheobjectcanbe left withouta masteronly if onerepositorycrashespermanentlyor thenetwork link betweenthemis permanentlyseveredwhile a transferis in progress.

60

Page 73: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

In more detail, our implementationworks as follows. Stepscarriedout bythe repositorytrying to acquiremastershiparenumberedstartingwith A1; stepscarriedout by the repositorycedingmastershipare numberedstartingwith C1andareindented.Request/grantidentifiersandmasterlocationhintsarestoredinmutableattributes.

A1. Checkthattherequestinguserhasthenecessaryaccesspermissionsandthatthecurrentmasterrepositorycanbereachedover thenetwork; quit if not.

A2. Choosea uniquerequestidentifier and recordit on the local copy of theobject.

A3. Ask thecurrentmasterto cedemastership,supplyingtherequestidentifier.

Do stepsC1–C4atomically:

C1. Checkthat the requestinguserhasthe necessaryaccesspermissions;refuseto cedemastershipif not.

C2. Formagrantidentifier. If theobjectis anappendabledirectory, do thisby appendinga list of its childrento the requestidentifier; otherwiseusetherequestidentifier. Recordit on thelocal copy of theobject.

C3. Changetheobject’s typefrom masterto nonmaster, andrecordthenewmasterrepository’s locationonit. This locationis of courseonly ahint,sincemastershipcouldmove to yet anotherrepositorylater.

C4. Returnthegrantidentifierto thecaller.

A4. If thecurrentmasterrefusedto cede,erasetherequestidentifierandquit.

A5. Atomically fill in any missingchildrenlistedin thegrantidentifier(creatingthemasnonmasterstubs),changetheobject’s typefrom nonmasterto mas-ter, recordthis repositoryin theobject’s masterlocationhint, andrecordthegrantidentifierin placeof therequestidentifier.

A6. Ask theold masterto erasethegrantidentifier.

C5. Erasethegrantidentifier.

A7. Erasethegrantidentifier.

Therepositorythatis trying to acquiremastershiptriespersistentlyto completethesesteps,even if it crashesandrestartsduring the transfer, until stepA7 is fin-ished.Thusmastershipwill not belost unlessoneof therepositoriespermanentlyfails(or thenetwork is permanentlysevered)betweenstepsC4andA5, andeveninthis casetherewill bea recordof the incompletetransferin whichever repositoryremainsup.

61

Page 74: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

4.4.6 PropagatingAttrib utes

Thedefinitionof agreementsaysnothingaboutmutableattributes;two replicasof asourcemayagreebut haveentirelydifferentattributes.Eachrepositorycanchangetheattributesof bothmasterandnonmastersources,andthereis norequirementtopropagateattributechangesto otherrepositories.But in many casessuchpropaga-tion is desirable,so we have includeda featurein theattribute facility to supportit.

Section4.1.6describedattributesasa total function F from namesto setsofvalues,but this is only theuser’s view. At a lower level of abstraction,anobject’sattributesare recordedas a history of statechangesH , representedas a set oftimestampedtuples.Eachof thefour write operations(set,clear, add,andremove)takes a timestampargument,which canbe any valuebut defaults to the time atwhich the operationwas requested.Applying an operationinsertsa new tupleinto H consistingof the nameof the operationandits arguments.F � H � is thencomputedwhenever neededby sorting H into timestamporder(with ties brokenby takingtheoperation,name,andvalueassecondarysortkeys),andapplyingtheresultingsequenceof operationsto theemptyfunction.

In additionto the high-level operationsthat query F , we alsoprovide a low-level operationto query H . This operationdoesnot necessarilyreturn H itself.Instead,it returnsa history K that is equivalentto H , in thefollowing sense.His-toriesH andK areequivalentif for any historyL, F � H L ��� F � K L � . Thatis,K mayaswell havebeentherealhistory, becauseonecannottell thedifferencebyobservingeitherthepresentstateof F or its futurestatesasmoreoperationsareap-plied. Theimplementationdoesnot storeH itself, but storesanequivalentK thatis generallysmaller. For example,if thesameattribute is settwice in succession,thefirst setoperationis not retainedin K .

The history level lets us make senseof the resultswhenattribute operationsareperformedindependentlyon two replicasof thesameobjectin differentrepos-itories. If the history K A at repositoryA is propagatedto repositoryB, we cancombineit with thehistory KB simply by forming H � K A KB; thenew F � H �thengivesa well-definedandreasonablefinal statefor theobject. This techniqueis adaptedfrom Grapevine [6].

In practicalterms,then,we propagateattribute changesfrom onerepositoryto anotherby sendingthe timestampedchangetuplesof the sourcerepositorytothedestinationrepository, thenunioningthemin to thesecondrepository’s changehistory.

62

Page 75: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

4.4.7 Replication AccessControl

We neededto adda few specialaccesscontrol lists to the repositoryto provideproper security for the replication primitives. As with other ACLs, if an ob-ject doesnot have its own replicationaccesscontrol lists, it inherits the accesscontrols from its parentdirectory. The #mastership-to accesscontrol listfor an object lists the repositoriesthat mastershipon the objectcanbe cededto,#mastership-fr omlists therepositoriesthatmastershipcanbeacceptedfrom,and#replicate-fro mlists therepositoriesthatreplicascanbetakenfrom. Inaddition, if an objecthasan #replicate-from -n oac accesscontrol list, itwill acceptreplicasof theobject’s datafrom therepositorieslisted,but it will notacceptreplicasof their accesscontrol attributes (that is, thoseattributeswhosenamesstartwith #. No specialACL is neededto controlgiving replicas;readac-cessby therequestinguseris sufficient for that. Administrative accessis requiredto changetheselists.

Why aretheselistsneeded?With mastership,wedonotwantto acceptmaster-ship from a roguerepositorythatmight claim to have mastershipon a objectthatis really masteredsomewhereelse,sincethis will causeus to comeinto disagree-mentwith the repositorythat really hasmastership.We alsodo not want to giveaway mastershipon anobjectto anunauthorizedrepository. With replication,wedo not wantto copy in datafrom a roguerepositorythatmight maliciouslysupplyincorrectvalues.

For convenience,we actually allow any userto replicatedatainto his localrepository, aslong ashehasreadpermissionfor thedatain theremoterepository,hehassearchpermissiononthedirectoriesinvolvedin thelocal repository, andtheremoterepositoryis ontheproperaccesscontrollist. Becausereplicatingdatadoesnotchangeit, therewouldbenosensein requiringtheuserto havewrite permissionin thelocal repository. In anearlierversionof therepository, werequiredthesamepermissionsto replicatedatain aswe would have requiredto write new data;thiscausedseveralannoying problems,suchasausernothaving permissionto replicatein thevalueof anobject’s#owner attributeunlesshewasalreadytheownerin thelocal repository, eventhoughthevalueto becopiedin from theremoterepositorywouldmake him theowner.

4.4.8 The Replicator

Theprimitiveslisted in theprevious sectionsgive usa safeway to copy dataandtransfermastershipbetweenrepositories,but they arequite low-level. In this sec-tion webriefly describeahigher-level replicator. It is availablebothasastandalonetool (vrepl) thatcanbeinvokedfrom thecommandline andasa library thatcanbe

63

Page 76: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

calledby othertools.Our replicatortakesasinput a setof pathnamepatternsandthe network ad-

dressesof two repositories,asourceandadestination.Thereplicatorwalksthedi-rectorytreeof thesourceto find all namesthatmatchthepatternsandcopiesthosethatarenotalreadypresentat thedestination.It alsoupdatesthemutableattributesof everynamethatmatchesby mergingupdatetuplesfrom thesourceinto thedes-tination.As a trivial example,thecommandvrepl -d east.vestasys.org -e+/vesta/west.vestasys.org/vesta/repos/LAST would replicatethe highest-numberedver-sionof theVestarepositorysourcecodefrom the local repositoryinto the reposi-tory ateast.vestasys.o rg . Thefull patternlanguageis similar to Unix shellglob patternswith someextensions(suchasthe pattern“LAST”, which matchesthehighestversionnumberthatis notastub).Prefixingapatternwith “+” addstheobjectsthatmatchit to thesetto becopied;prefixingit with “ -” removesthem.

Thereplicatoralsohasa featurethatreplicateseverythingneededto do a par-ticular Vestabuild. For example,the commandvrepl -s west.vestasys.org -e@/vesta/west.vestasys.org/vesta/repos/124/.main.ves wouldreplicateall thesourcecodeneededto recompileversion124 of the repositoryimplementationfrom thewest.vestasys.o rg repositoryinto the local repository, including theentireprogrammingenvironment (libraries, compilers,etc.) that was available to thebuild. This featureworksby first parsingthebuild description(written in theVestasystemdescriptionlanguage),walking its import tree,andemittinga + patternforevery packagefound; thenpassingthesepatternson to thebasicreplicationalgo-rithm. In practice,it hasturnedout that @ is usedfar moreoften than+ and -are.

The replicatortool can be usedto “push” sourcesfrom the local repositoryto a remoteone, to “pull” sourcesfrom a remoterepositoryto the local one,oreven to copy sourcesbetweentwo different remoterepositories.The repositorydoesnot currentlyincludesupportfor automaticallytriggeringa replicationwhennew packageversionsarechecked in, thoughthereis somesupportfor automaticreplicationin vcheckin, describedbelow. Whentwo sitesarecooperating,it workswell to setup a periodicrun of the replicator(say, from a Unix cron job) to copynew sourcesbetweenthem.Thereplicatorcanalsoberunmanuallywhenneeded.

Performancemeasurementson thereplicatorarepresentedin Section9.3.5.

4.4.9 Cross-RepositoryCheckout

Whentwo sitesrunningseparaterepositoriesarecloselycooperating,usersat onesite may want to checkout packageswhosemastercopiesarein the othersite’srepository. In thissectionweoutlinehow weextendedthedevelopmentcycletoolsdescribedin Section4.1.4to supportthis.

64

Page 77: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

= Immutable = Master Appendable = Nonmaster Appendable = Master Stub = Nonmaster Stub = Mutable

givereplica

create,then cede

mastershipcreate,

then cedemastership

4

4

/vesta/west.vestasys.org

common

thread

checkout

3

copy

acceptmastershipaccept

mastershipcopy

getreplica

4

4

0

thread

3

/vesta/west.vestasys.org

common

thread

checkout

/vesta−work

jones

Figure4.7: Cross-repositorycheckout.

65

Page 78: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Almostall of thechangesfor cross-repositoryoperationwerein vcheckout, asshown in Figure4.7. Thesourcerepository, wherethepackagebeingcheckedoutis mastered,is shown at the top; the destinationrepository, which the userdoingthecheckout wantsto work in, is shown at thebottom. Notice that theactionsinthe destinationrepositoryaresimilar to the single-repositorycasein Figure4.2.Thestepsin cross-repositorycheckout are:

1. Examinethemasterreplicain thesourcerepositoryto find thehighestver-sionnumber.

2. If this versiondoesnot exist in thedestination,call thereplicatorto copy itin.

3. Createthereservationstubandemptysessiondirectoryin thesourcerepos-itory.

4. Call thereplicatorto copy themto thedestinationrepository.

5. Transfermastershipon themfrom thesourceto thedestination.

6. Insertversion0 in thesessionandcreatetheworking directoryat thedesti-nation.

No changeswere neededto vadvance, sinceboth the working and sessiondirectoriesarein thedestinationrepository.

It would not have beenstrictly necessaryto changevcheckin either, sincevcheckout movesmastershipof thereservation stubto thedestinationrepository.However, it is likely that the sourcerepositorywill want a copy of the new ver-sion soon,so we modifiedvcheckin to call the replicatorandcopy it thereaftercheckingit in locally.

TheVestatoolsfor creatingnew packages(vcreate), branchingtheversionse-quence(vbranch), finding the latestversion(vlatest), andfinding who haspack-agescheckedout (vbranch) alsorequiredminor modificationsto becross-reposi-tory aware.Thechangesrequiredweresimilar to whatwasdoneto vcheckout butconsiderablysimpler.

Oneproblemcurrentlyremainswith thecross-repositorytools. In thesingle-repositorycase,eachof our tools usesthe repository’s short atomic transactionsupportto make its completeactionatomic. This supportdoesnot work acrossmultiplerepositories,sothetoolsbecomenonatomicin thiscase.With vcheckout,steps(3) and(6) areindividually atomic,but if thereis a failurebetweenthem,thecheckout is left in an incompletestate. We have not yet automatedthe recoveryfrom thisstate,but it is nothardto recover manually.

66

Page 79: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Theperformanceof thetools in thesingle-andcross-repositorycasesis com-paredin Section9.3.4.

67

Page 80: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Chapter 5

SystemModeling Language

Therearetwo inputsto theconstructionof a softwaresystem:thesourcesandtheinstructionsfor producingthe systemfrom thosesources.For small codebases,the form taken by theseinstructionscanbe rathersimplistic. However, for evenmoderatelylargesystems,experienceshows thata build languagewith moreflex-ibility andsupportfor abstractionmakes it easierto specify the constructionofsuchsystems.For example,becauseMake doesnot provide any abstractionfacili-ties,writing Make build descriptionsfor suchsystemsis a bit like doingassemblylanguageprogramming.

By contrast,Vesta’s build language,or systemmodelinglanguage, is a full-fledgedprogramminglanguagethatsupportscomplete,hierarchicalbuild descrip-tions, andis vastly morepowerful andflexible thanMake’s. Moreover, the lan-guage’s supportfor functional abstractionmakes it possibleto encapsulatelow-level building tasks,therebysimplifying the systemdescriptionswritten by endusers.

This sectiondescribesthe Vestasystemmodelinglanguageandthe structureof Vestasystemmodels. It alsodescribesthe standardconstructionenvironmentwe have written for building C, C++, andModula-3programs,givesexamplesofmodelsfor differentkindsof packages,anddescribesthemechanismsprovidedbytheconstructionenvironmentfor performingcustomizedbuilds.

Although this sectiondescribessomeof the importantfeaturesof the Vestasystemmodelinglanguage,it is by nomeanscomplete.Rather, its aimis to conveytheform takenby typicaluser-written models,to show thatthey aresimple,andtopresentoneway of organizinga standardconstructionenvironmentthat supportsflexible customizationmechanisms.AppendixA presentsthelanguage’s completesyntaxandsemantics.

68

Page 81: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

5.1 Moti vation

In Vesta,the instructionsfor building a systemarecontainedin systemmodels.Systemmodelsareprogramswritten in theVestasystemmodelinglanguage.Theydescribehow to build asoftwaresystemfrom sources,i.e., from scratch.While thesystemmodelsarebeingevaluated,tools like compilersand linkers are invokedto build the program;the resultingbinariesare typically returnedas part of theevaluationresult.

A few essentialrequirementsdictatethestructureandfunctionalityof thesys-temmodelinglanguage:

� Thebuilder (i.e., thelanguageinterpreter)mustbeableto constructsystemsrepeatably, incrementally, andconsistently.

� Thecomplexity of asoftwaredescriptionshouldbeproportionalto thecon-ceptualcomplexity of building thesystemit describes.

� The languagemust be practical for developersto use; that is, it must beadaptableto avarietyof softwaredevelopmentmethodologiesandorganiza-tional processes.

It follows that themodelinglanguageshouldbeorganizedso thatevenerrorsin systemmodelscannotinterferewith repeatableandconsistentbuilds. Moreover,thelanguageshouldsupportincrementalbuilding asthenorm,sothatgoodperfor-manceis therule,not theexception.It alsofollows thatthecorelanguagefacilitiesshouldbeasbasicand“methodology-neutral” aspossible,andthatsupportfor par-ticular stylesof systemconstructionor organizationshouldbeprogrammedin thelanguage,notbuilt in to thelanguageor thebuilder.

Theserequirementsandimmediateconsequencesestablishthe major proper-tiesof theVestalanguage:

� Repeatabilityand consistency give rise to two properties:all informationrequiredto build a systemfrom sourcesis capturedin systemmodels,andall sourcesareimmutable.

� Incrementalityleadsto the choiceof a functional language,becauseeachfunction invocationthenrepresentsa unit of work that canbeconvenientlycached.

� “Proportionalcomplexity” is achievedby providing aflexible modularstruc-turein which reusableabstractionscanbeeasilydefined.

69

Page 82: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

� Methodologicalneutralitycompelsa carefulchoiceof basicdatatypesandprimitive operations.

It hasbeenrightly saidthatonecanwrite a badprogramin any language,andtheVestamodelinglanguageis no exception.But experiencedemonstratesthat itis possibleto write goodprograms(that is, concise,precisesystemdescriptions)in this languagewithout greatdifficulty. Thefacilitiesof theVestalanguageallowdevelopmentgroupsto organizetheir systemdescriptionsto fit their developmentmethodology, while achieving the centralbenefitsof repeatable,consistent,andincrementalbuilds. The remainderof this sectioncoversthe importantlanguagefacilities,andshowshow they canbeusedto modelaparticulardevelopmentstruc-ture.

5.2 LanguageHighlights

UnlikesoftwareconstructionlanguagessuchasMake’s,theVestasystemmodelinglanguageis a full-fledgedprogramminglanguagewith a well-definedsyntaxandsemantics(seeAppendixA).

The Vestamodelinglanguageis functional,modular, dynamicallytyped,andlexically scoped.Its valuespacecontainsbooleans,integers,text strings,lists,clo-sures,andbindings.Thefirst five datatypesarethefamiliar onesfrom Algol-likelanguagesandLISP. Bindingsareorderedlists of name-valuepairs.Thelanguagecontainsabout60built-in functionsfor arithmeticandbooleanoperations,for basicmanipulationsof texts, lists,andbindings,andfor invokingexternaltools.Thereisalsoabuilt-in functionfor applyingaclosureto alist of valuesin parallel,whichweuseto achieve coarse-grainedparallelcompilation.Althoughall valuesaretyped,andoperationsaretype-checked at execution(interpretation)time, statictyping isoptional.1

Thereareclearadvantagesanddisadvantagesto usinga completefunctionalprogramminglanguageas the basisfor software descriptions.The main advan-tageis that a functional languageforms a tractablebasisfor cachingof functioncalls,which in turn is thebasisfor Vesta’s incrementalbuilding mechanism.Themain disadvantageis that new usersmust learn a new language,different fromotherscriptinglanguagesfor programconstruction.We considertheadvantageofreliableincrementalbuilding to beoverwhelminglyimportant,andthereforehavechosento designa new language,but we have doneall we can to minimize thebarrierthisposesfor users.To reduceinitial unfamiliarity, theVestalanguageuses

1Therearesimpleprovisionsin thesyntaxfor annotatingthetypesof variablesandfunctionre-sults.However, theseannotationsserveonly ascomments;they areignoredby thecurrentevaluator.

70

Page 83: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

a C-like syntaxandC semanticswherever practical. To reduceconceptualunfa-miliarity, Vestaprovidessimpletemplatesfor systemmodelsdescribingcommonsituations,anda“library” of functionsfor systemconstructionthatcanbeinvokedfrom morecustomizedmodels. Thus in mostcases,writing a model reducestofilling theblanksin astandardtemplate.

Short,simpletemplatesrequiretheability to decomposethecompletedescrip-tion of a systembuild into componentparts. That is, the Vestalanguagemustsupportorganizingthe instructionsfor building a systeminto a set of modularunits. This is accomplishedby enablingonesystemmodelto reference,or importanother, yielding a hierarchicalstructurethat canreflectthe componentstructureof thesoftwaresystemitself.

Modular structuresaredesignedto localizeinformation,which is generallyawisemethodologicalprinciple for organizingsoftwaresystems.However, thena-tureof thebuild processfrequentlyrequiresbroad,systematicalterationsof defaultbehavior, andthedescriptionlanguagemustaccommodatethesesituationsgrace-fully. For example,a customizedactionmayapply to an entirebuild (“build thisprogramandall the libraries it useswith debuggingsymbols”)or to a large partof it (“build thesymbolicintegrationlibrary with optimization”). In practice,thismeansthat the functionsfor building a systemmustbesufficiently parameterizedsothat theconstructionof the individual componentscanbesensiblycustomizedby thecallersof thosefunctions.

How is the potentialchaosof customizationcontrolled?By extensive useofparameterizationin thesystemdescriptions,it is possibleto localizetheactualcus-tomizedvaluesof parametersneartheroot of thehierarchyof systemdescriptionmodules.To facilitatecustomizedbuilds, thecurrentconstructionparametersarecollectedtogetherin a singlecompositevaluecalledtheenvironment. In additionto parametersthatcontrolhow tools like thecompilerandlinker areinvoked, theenvironmentpassedto eachsuchtool containsa completerepresentationof thefile systemin which the tool is to run. TheVestalanguage’s binding type is usedto representenvironments,andthelanguageincludesoperatorsfor easilymergingandoverridingparticularpartsof theenvironment.

TheVestanotionof environmentis central.A preciselyspecifiedbuild is noth-ing morethanaseriesof tool invocations(e.g.,compilesandlinks) in acontrollednamingenvironment.Thatenvironmentchangessubtlybut crucially on eachtoolinvocation,andtheprocessof constructingthesemany slightly differentenviron-mentsmustbebothconvenientto expressandefficient to implement.So,theVestalanguageprovides:

� a mechanismby which the currentenvironment is easily passedbetweenfunctions,

71

Page 84: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

� a bindingdatatypeusedto representnamingenvironments(includingbuildcustomizationsandfile directories),aswell aslanguagefacilities for easilycreatingandaugmentingbindings,

� a languageprimitive for executinga tool in aparticularenvironment,and

� a closure datatypefor delayingtool invocationsuntil thefiles they produceareneeded.

Thenext four subsectionsdescribetheseparticularaspectsof theVestasystemmodelinglanguagein moredetail.

5.2.1 The Envir onment Parameter

As statedpreviously, Vestabuilds areparameterizedandcustomizedby amalga-matingfile systemsandbuild optionstogetherin a singlecompositebindingvaluecalledtheenvironment.Sincetheenvironmentplayssuchanimportantrole, thereis specialtreatmentfor it in theVestalanguage.

In particular, every Vestafunctiontakesanimplicit final parameternamed“ . ”(pronounced“dot”) denotingthecurrentenvironment.Thatis, a functiondeclaredwith n explicit formal argumentsactuallyhasn � 1 arguments,the lastbeingtheimplicit formalparameternamed“ . ”. Thename“ . ” is alegalVestaidentifier, soitcanbeusedasanormalvariablename.A functionwith n formalargumentscanbecalledwith eithern or n � 1 actualarguments.In theformercase,thecurrentvalueof “ . ” is boundto theimplicit formalparameter“ . ”; in thelattercase,then � 1stactualparameteris boundto theimplicit formalparameter“ . ”. In ourexperience,theenvironmentis rarelypassedexplicitly; it is usually“inherited” from caller tocallee.

5.2.2 Bindings

Amongotherthings,thecurrentenvironmentembodiesbothbuild customizationsand the file systemin which tools are invoked. A convenientdatatype for rep-resentingboth thesekinds of valuesis the binding, an orderedmapfrom names(strings)to arbitraryVestavalues.Bindingscanbenested.

Build customizationsare representedby hierarchicalarrangementsof map-pingsfrom customizationnamesto values.For example,Figure5.1shows a bind-ing thatmight beusedto specifytheoptionsfor compilingandlinking C++ pro-grams.Thecodein Figure5.1ademonstratesthesyntaxfor constructingabinding:

[ name1 ��� alue1 � ����� � namen ��� aluen ]

72

Page 85: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

denotesthe binding with namesnamei and correspondingvalues � aluei . Fig-ure5.1bshows atreerepresentationof thebinding.Thehierarchyarisesbecauseabindingis itself avaluein theVestalanguage.

[ Cxx= [ switches= [ compile= [opt="−O1"], link= [strip="−s"] ]]]

Cxx

switches

compile link

opt="−O1" strip="−s"

(a)

(b)

Figure5.1: The codefor constructinga binding of build options(a),andthesamebindingrepresentedasa tree(b).

Binding datastructurescanalsobe usedto representfile systems.Eachdi-rectoryin sucha file systemis representedby a binding that mapseachnameinthedirectoryeitherto a subdirectory(a nestedbinding)or to a file (a text value).2

Hence,adirectoryd containingfilesnamedf1 and f2 with correspondingcontentsc1 andc2 wouldberepresentedby thebindingd � [ f1 � c1 � f2 � c2].

TheVestaevaluatorstoresbindingsin mainmemory, makingthetraversalandmanipulationof these“directory” structuresvery efficient comparedto conven-tionalfile systems.Sincebindingsarelightweight,new file systemscanbecreatedandaugmentedon thefly for theparticularneedsof eachtool. In traditionaldevel-opmentenvironments,creatingsuchcustomizedfile systemswould betoo expen-sivebecauseit would requirediskoperations.Instead,Unix toolsusecumbersomemechanismslike searchpaths.In Vesta,a customfile systemis easilyconstructedfor eachtool, eliminatingtheneedfor longsearchpathsandmakingthefile choicesmoreexplicit.

TheVestalanguageincludessyntaxandoperatorsfor creatingbindings,select-ing a binding elementby name,merging two bindings,andsubtractingelementsfrom a binding. Bindingsareconstructedusingthesquarebracket syntaxshownabove. The valuenamedn in the binding b is selectedby writing b/n. For ex-ample,the compilerswitchesareselectedfrom the binding namedoptions inFigure5.1 by writing options/Cxx/swi tc hes/c ompi le , which evaluatesto thesingletonbinding [ opt = "-O1" ] .

2Becausefile contentsarerepresentedabstractlyin the languageby text values,the implemen-tation must be able to handlelarge text valuesthat are readand written as files; it doesthis byrepresentingsuchvaluesinternallyaspointersto files storedin therepository(Section4.2.1).

73

Page 86: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

b1 b2 b1 + b2 b1 ++ b2

foo=1 bar bar baz foo=1 bar baz foo=1 bar baz

a=1 b=2 b=8 c=9 d=1 b=8 c=9 d=1 a=1 b=8 c=9 d=1

Figure 5.2: The resultsproducedby overlaying (+) and recursivelyoverlaying (++) two bindings,b1 and b2. Notice that in b1 + b2, thebar componentof b1 is ignored,while in b1 ++ b2, thebar componentsof b1 and b2 are overlaid recursively, with the valuesfrom b2 takingprecedence.

Bindingsaremergedusingtheoverlay (+) andrecursiveoverlay (++) opera-tors.Theexpressionb1 + b2 is thebindingcontainingtheunionof thenamesin b1

andb2; for thosenamesthatappearin bothbindings,thevalueboundto thenamein b2 takesprecedence.Thebindingb1 ++ b2 is likeb1 + b2, exceptthatwherebothbindingscontainthesamename,andwhenthatnameis boundto anestedbindingin each,thenestedbindingsareoverlaidrecursively. Figure5.2shows anexampleapplicationof the+ and++ bindingoperators.

The overlay and recursive overlay operatorsprovide the basisfor extendingfile systemsandfor applyingcustomizations.For example,if “ . ” is a bindingdenotingthecurrentenvironment,and“ . /root” is therootof afile system,thentheenvironmentcanbeextendedto includetheextradirectoriesandfilescontainedinthebinding fs by writing:

. ++= [ root = fs ];

(As in C, “a op= b” is shorthandfor “a = a opb”.)Compilationoptionscanalsobe overriddenusingoverlays. For example,to

changethebuild “options” above soasnot to useoptimization,youcouldwrite:

options ++=[Cxx = [switches = [compile = [opt = "-O0"]]]];

Suchbindingsareso commonthat the languageincludesa shorthandfor writingthem.For example,thesameoverlaycanbewrittenasfollows:

options ++= [ Cxx/switches/compile/opt = "-O0" ];

By definitionof the recursive overlayoperator, this assignmentleavesthe linkingoptionsunchanged,but it bindstheopt field of thenestedcompile binding to“-O0”. Both thisexampleandthepreviousonedemonstratehow selectiveportions(i.e., subtrees)of theenvironmentcanbechangedby a singlesourcestatementinaVestamodel.

74

Page 87: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

5.2.3 Tool Encapsulation

Consistentwith thefunctionalnatureof thelanguage,invocationsof externaltoolslike compilersandlinkersareexpressedasfunctioncalls. To call anarbitrarytoolin an encapsulatedenvironment,the Vestalanguageprovidesa built-in primitivefunctioncalled run tool . (To avoid nameconflictswith user-definedfunctions,all primitive functionsin the Vestalanguagebegin with an underscore.)Hence,thereis a singleuniform mechanismfor encapsulatingnew tools with Vesta;nomodificationsto a tool arerequiredto invoke it. Tool invocationsvia run toolarelaunchedby aruntoolserverrunningonamachineof theappropriateplatform.3

The run tool primitive is parameterizedby a commandline and severalotherargumentsdescribingthe environmentin which the tool shouldbe run. Itreturnsa bindingwith fieldsdescribingthetool’s outcome.Thecompletespecifi-cationof run tool is givenin AppendixA. Amongthe run tool argumentsareflags for specifyingthat the function call result shouldnot be cachedin theeventof variouserrorconditions,for example,if thetool returnsa non-zerostatuscodeor if it writesanything to thestandarderroroutput.

Of course,theenvironmentparameter“ . ” is alsoanargumentto run tool ;it encapsulatesboth the tool’s environmentvariablesandthefile systemin whichthetool is run. Environmentvariablesarepassedin ./envVars : if it is defined,thevalue./envVars is takento beabindingthatdefinesthenamesandvaluesofenvironmentvariablesto besetfor thetool’sexecution.Thefile systemseenby theencapsulatedtool is passedin ./root . In particular, any absolutepathnamesref-erencedby thetool arelookedup in ./root , while relativepathnamesarelookedup in the nestedbinding ./root/.WD . As describedin Chapter3, file systemreferencesaretrappedby therepository, lookedup in ./root or ./root/.WDby theevaluator, andrecordedby theevaluatorasdependenciesof the run toolfunctioncall. All environmentvariablesdefinedin ./envVars arealsorecordedasdependencies.4

Althoughthe run tool functionprovidesa flexible meansfor invoking ex-ternaltools,it is ratherprimitive. Wedonotexpectmostuser-written systemmod-elsto call it directly. Instead,weprovidereusablelibrariesof Vestafunctionscalledbridgesaspartof thestandardconstructionenvironmentfor interfacingto develop-

3To determinewhichmachineto contactfor agivenplatform,theevaluatorconsultstheVestasiteconfigurationfile, which containsa list of suitablemachinesfor eachplatformtype; it theniteratesthroughthe list to find an unloadedor lightly loadedmachine. If the evaluatoris doing a parallelbuild, it mayhave severaltoolsrunningon oneor moremachinesat thesametime.

4We would have preferredto recorddependencieson only thoseenvironmentvariablesreadbythetool, but thereis nomechanismfor trappingsuchreferences.In any event,wehavenever neededto setany environmentvariablesbeforeinvokinga tool, sowehavenotexperiencedany falsedepen-denciesonunusedenvironmentvariables.

75

Page 88: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

menttools(seeSection5.4below). We alsoexpectbridgecodeto hidethedetailsof invoking run tool appropriatelyfor multi-platform andcross-platformde-velopment.However, our currentbridgesdo not yetprovide suchsupport.

5.2.4 Closures

A closure is a functionpairedwith a context thatsuppliesvaluesfor all of theun-boundvariablesappearingin thefunction’s body. In Vesta,closuresarefirst-classvalues;that is, they canbe passedasparametersor embeddedin datastructuresjustasany othervaluecan.

EachVestamodeldefinesa closurewith a singleimplicit parameter(namely,“ . ”). Hence,amodel’sbodycanbeinvokedusingthenormalfunctioncall syntax.At thestartof anevaluation,theVestaevaluatorinvokesthemodelon which it isrunby calling it with anemptyenvironment.

Closureshave severaluses.For example,acollectionof callablefunctionscanberepresentedby a bindingthatmapsfunctionnamesto closurevalues.Considerthefollowing model:

{f() { /* body of closure f here */ };g() { /* body of closure g here */ };return [ x = f, y = g ];

}

Wheninvoked, this modelreturnsa bindingcontainingfieldsnamedx andyboundto theclosuresf andg, respectively. If that resultbindingwasassignedtothevariableb, thentheclosureg couldbeinvokedby writing b/y() .

Closurescanalsobeusedto effect lazy evaluation:a lazy valuecanberepre-sentedby a closurefor computingthat value. The work requiredto computethevaluecanthusbedelayeduntil it is needed.This way, costlycomputations(suchascompilations)whoseresultsmight not be usedarenot performed. Suchlazyevaluationis not generallyrequired,but in thefew caseswhereit is needed,it canbeimplementedexplicitly usingclosures.

5.3 Model Organization

With thelanguagepreliminariestakencareof, we cannow focuson theorganiza-tion of softwaredescriptionsin Vesta.Therearetwopartstoasoftwaredescription:a list of the sourcesthat comprisethe software,andthe instructionsfor buildingthosesourcesto producederivedobjectfiles, libraries,andexecutables.In Vesta,thesetwo piecesareunifiedin files calledsystemmodels,or modelsfor short.

76

Page 89: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

5.3.1 Propertiesof Models

Models have two importantproperties. Thesetwo properties,togetherwith theimmutabilityof all sourcefiles,arethebasisfor repeatabilityin Vesta.

First, like all sourcesin the Vestarepository, all modelsare versioned,im-mutable,andimmortal.Hence,theinstructionsfor building aparticularversionofa softwareartifact arenever lost. In fact,modelsareregular files that arepart ofversionedpackagesjust like sourcefiles.

Second,modelsarecomplete:theresultof amodelevaluationdoesnotdependon any file, environmentvariable,or otheraspectof thesystemon which Vestaisbeingrun that is not explicitly specified,eitherby themodelitself or by a modelthatit imports(directlyor indirectly). Putanotherway, anevaluatedmodelandthemodelsit importsform a completerecordof thesourcesandtoolscontributing toabuild.

This secondpropertyimplies that in additionto theparticularversionof eachsourcefile andmodel that contributesto a build, the modelsmustalsonametheparticularversionsof all externaltoolsinvokedduringthebuild. An in-housetoolfor which sourcesareavailablemaybedescribedby a modelfor building thetoolfrom source. A tool for which no sourcesareavailable is describedby a modelthatsimply returnsthepre-built tool binaries(which arestoredas“source” in therepository).Both kindsof toolsareencapsulatedin thesameway by thestandardconstructionenvironment.

At this point, you mayimaginethatmodelswill bedifficult to maintain,sinceit wouldseemtediousto haveto nameandupdateaversionnumberfor eachsourcefile referencedwithin a model.However, theburdenof referringto correctsourceversionsis mitigatedby threefactors. First, sourcesareversionedandimportedat the granularityof packages(directorytrees),not individual files. Second,dueto thehierarchicalarrangementof packages,updatingtheversionof asinglehigh-level import can indirectly import new versionsof many packageslower in thehierarchy. Third, amodeldirectlyreferencesonly sourcefilesthatexist in thesamepackageasthemodelitself, thoughit mayimportmodelseitherinsideandoutsidethe package.Referencesto local sourcefiles and local modelsdo not requireaversionspecificationbecausethepackageis versionedasawhole;thesereferencesareimplicitly boundto files in thesamepackageversionasthemodelitself.

Referencesto modelsin otherpackagesdo requirean explicit versionspeci-fication. This requirementis sensible:to incorporatea new versionof a differentpackage,youhave to explicitly updateyourbuilding instructions.Putanotherway,eachdeveloperhascompletecontrol over when to incorporatenew versionsofotherdevelopers’changesinto his or herown build. Sincethis is a fairly commonoperation,it needsto beeasyfor usersto carryout. We thereforeprovide a vup-

77

Page 90: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

date tool for quickly updatinga package’s importsfrom thecommandline. Thereis alsoavimports tool for listing thetransitive closureof amodel’s imports.

5.3.2 Requirementson Model Organization

A systemmodelis the unit of modularityin a Vestasystemdescription.Thus,acompletedescriptionis built up by writing a collectionof systemmodelsthat ref-erenceeachother. But how is this to bedone?Thatis, whatorganizingprinciplesor methodologyshouldbeusedto createacollectionof understandableandusablemodels?

Of course,thereis nosingleanswerto thisquestion.To demonstratetheVestalanguage’s feasibility for this purpose,the following sectionsdescribea particu-lar organizationof systemmodels,derived from someparticulardevelopmentalrequirements,characterizedasfollows:

� We expectthat the files in a packageunderdevelopmentchangemorefre-quentlythantheenvironmentin which thatpackageis built. Consequently,it is sensibleandusefulto separatethedescriptionof thepackagefrom thedescriptionof theenvironmentfor building thepackage.

� Fromtheperspective of systemconstruction,many packageshave a similarform. For example,a packagemay build eithera completeprogramor abody of codeintendedto be linked into otherprograms. (We tendto calltheformeranapplicationandthelattera library.) Generally, themodelsforapplicationslook quitesimilar to oneanother, asdo themodelsfor libraries.To simplify andstandardizeusermodels,themodelhierarchyshouldbeor-ganizedso that commonfeaturescanbe factoredout into system-providedmodels. In effect, thesestandardforms becomepart of the developmentenvironment,availableto beinvokedby short,simplemodelswrittenby de-velopers.Of course,thestandardformscanbemodifiedor entirelybypassedin exceptionalcases.

� Occasionally, a developerwill needa specializedenvironment.Thespecial-ization might includespecialversionsof libraries,or librariescompiledina specialway or with a specialversionof a compiler. Consequently, thesystemdescriptionfor the constructionof the environmentmustbe exten-sively parameterizedto permitanindividual developerto createthecustomenvironmentneededto developapackage.

� Any constructedsystemis linkedtogetherfrom a collectionof components,typically derivedobjectfilesandlibraryarchives.Thedependenciesbetween

78

Page 91: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

thesecomponentsinduceanabstracthierarchy;in practice,thecomponentsmustbe given to the linker in a total orderconsistentwith the hierarchy’spartialorder. The library hierarchyis distinct from thepackagenaminghi-erarchy, somodelsmustrepresentthelibrary hierarchyexplicitly.

� Systemsarebuilt by executingtools(e.g.,compilersandlinkers),whichgen-erally have extensive optionsthatareneededonly for specializedpurposes.Theseoptionsmustbeaccessiblefrom Vestamodels,but mostusersof thetools never needto be awareof them. Consequently, the mechanismsforcompilingandlinking packagecomponentsmusthidethesecomplexities bydefault, without compromisinga developer’s ability to invoke the tools inarbitraryways.

Theseconsiderationsleadto theorganizationof a standard constructionenvi-ronment, which is discussedin Section5.4,andahierarchyof library descriptions,whichweconsidernext.

5.3.3 Model Hierar chies

To accommodatelarge-scalesoftware,Vestasoftwaredescriptionsareorganizedhierarchically;that is, they aretreeswith modelsasnodesandmodelimportsasedges.

For example,Figure5.3shows thehierarchyof packagescomprisingthecom-pletereleaseof ahypotheticalmail system.Directedarrows in thefigurerepresentmodel importsacrosspackages.At the lower right of the figure are two librarypackagesnamedmail/sendrcvandmail/index, for transportingandindexing mail,respectively. Thereleaseconsistsof two applicationpackagesnamedmail/inc andmail/search, which provide facilities for incorporatingnew mail andperformingquerieson mail messages.

At the root of the treeis the releasepackageitself. The releasepackagedoesnot containany sourcecode.Instead,its modelimportsthestandardenvironment,the packagesfor the two applicationprograms,andan umbrella library packagenamedmail/lib, which itself importsthetwo baselibraries.

As shown in Figure5.3,thestandardenvironmentimportstwo kindsof models:

� the bridge modelsthat export interfacesfor invoking collectionsof tools,suchaslanguage-specificcompilersandlinkers,and

� all of thestandardlibrariesrequiredby client applications(in this example,only two standardC librariesareshown).

79

Page 92: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

mail/release

common/std_env

mail/inc

mail/search

mail/lib_umb

mail/send_rcv

mail/index

common/c

common/cxx

c/libc

c/pthreads

Bridges

Figure5.3: The packagesof a hypotheticalmail systemandtheir im-ports.

As indicatedby thesubtreerootedat themail/lib umbpackage,library pack-agesthemselvesmaybearrangedin hierarchies.In fact,our standardconstructionenvironmentsupportsthreekinds of libraries: umbrella libraries, which arecol-lectionsof otherlibraries,leaf libraries, which arelibrariesbuilt from source,andpre-built libraries, whicharelibrariesthatexist only in binaryform. In Figure5.3,mail/lib umbis anumbrellalibrary, mail/sendrcvandmail/index areleaf libraries,andc/pthreadsandc/libc arepre-built libraries.

Umbrellalibrariesaremostlyaconvenience:they provideawayto packageupa collectionof relatedlibrariesinto a singleconceptualentity. They alsoencapsu-latetheorderin whichthechild librariesmustbelistedonthefinal link line to linkcorrectly;thebridgemodelsarrangeto link thelibrariesin a total orderconsistentwith thepartialorderinducedby thelibrary hierarchy. In thisexample,thereleasepackageneedonly import onelibrary packageratherthantwo. In largersoftwaresystems—suchasthe releasemodelfor theVestasystemitself, which contains9leaf libraries—theuseof umbrellalibrariesservesto make client modelssimplerandmoresuccinct.

Theparticulararrangementof modelimportsshown in Figure5.3 is not fixed.Themodelscouldhavebeenarrangedin otherways,andthecurrentarrangementispartly a consequenceof how our standardconstructionenvironmentis structured,asdescribedbelow in Section5.4.

Perhapsthemostcuriousaspectof thecurrentarrangementis thattheumbrellalibrary is importedby the releasepackage,ratherthanby the individual applica-tion packagesthemselves.An alternative arrangementis shown in Figure5.4. Themodelscertainlycould have beenstructuredthat way. The advantageto the ar-rangementshown in Figure5.3 is that the umbrellapackageis importedin onlyoneplace,so only oneimport statementmustbe updatedwhena new versionof

80

Page 93: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

mail/release

common/std_env

mail/inc

mail/search

common/c

common/cxx

c/libc

c/pthreads

Bridges

mail/lib_umb

mail/send_rcv

mail/index

Figure5.4: An alternative arrangementof packageimportsof the hy-potheticalmail systemof Figure5.3.

thatpackagebecomesavailable.

5.3.4 Imports

Recall that a model definesa closureof a single argument(namely, “ . ”). Theimport statementbindsa local identifierto theclosurecorrespondingto themodelbeingimported.For example,themail system’s releasemodelmightbegin:

importstd_env =

/vesta/west.vestasys.org/common/s td_env /9/bu ild.v es;{

// build the current environment. = std_env()/env_build("AlphaDU4.0");

// instructions for building the complete release...}

The import statementbindsthe local identifier std env to the closurecorre-spondingto version9 of thestandardenvironmentmodel.Thatmodelis calledbythe expressionstd env() . When evaluated,the standardenvironmentmodelreturnsa binding representingan abstractinterface as describedabove in Sec-tion 5.2.4. That binding mapsthenameenv build to a closurethatbuilds thecurrentenvironmentfor thetargetplatformnamedby its first argument.Theresultis boundto thedistinguishedenvironmentidentifier “ . ” for useby therestof thebuild.

81

Page 94: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

5.3.5 Control PanelModels

Certainaspectsof a packagebuild—suchaswhich versionof the standardenvi-ronmentmodelto useandany desiredbuild customizations—aremoretransientinnaturethantheinstructionsfor building thepackageitself. To facilitatethemodifi-cationof thesetransientattributes,we envisageagraphicaluserinterfaceprogramcalled the Vestacontrol panel for viewing andmodifying the attributes,and forinitiating Vestabuilds of thepackagein question.

Wehave notwrittenacontrolpanelapplication,but we imagineit wouldworkby readingandwriting its own legal Vestamodels,calledcontrol panelmodels.Eachcontrolpanelmodelwouldembodytheimportsandcustomizationsspecifiedby theuserin thegraphicaluserinterface.

In theabsenceof aVestacontrolpanel,wesimplycreateandeditcontrolpanelmodelsby hand.By convention,they arenamed.main.ves. They arehighly styl-ized, andusuallyon the orderof 10 to 20 lines long, dependingon the numberof customizationsspecified. A control panelmodelbuilds a specificinstanceofits packageby settingparameters,importinga suitableversionof thestandarden-vironment,and thenusinganothermodel in the package(conventionally namedbuild.ves) to compileandlink thepackage’s sourcecodein this environment.Thebuild.vesmodelis designedto beindependentof theversionof thestandardenvi-ronmentin use,so it canalsobecalledfrom elsewhereto build thesamepackagein differentenvironments.

A packagemayproduceseveraldifferentclassesof derivedfiles. For example,apackagemayproduceexecutablesmeantto beexportedto clients,testprograms,libraries,anddocumentation.Wedonotnecessarilywantto produceeachof theseartifactson every build. So by convention,a package’s build.vesmodelreturnsabindingthatcontainsa separateclosurefor building eachcategory of derivedfile.The control panelmodeltheninvokesonly thoseclosuresin the binding that arecurrentlyof interestto theuser. In essence,eachcategory is built lazily.

Figure 5.5 shows a typical control panelmodel. The import lines at thebeginningimportthepackage’s localbuild.vesmodelandthestandardenvironmentmodel,boundto thenamesself andstd env , respectively. Line (1) invokestheenv build function of the std env model to build an environment targetedto Alphasrunningthe Digital Unix 4.0 operatingsystem. Line (2) evaluatesthelocal build.vesmodel,binding the result to the local variableb. Finally, line (3)selectsthe lib andprogs fields out of thebinding b. Eachof thesefields is asingle-argumentclosure(theall-importantimplicit argument“ . ”). Eachclosureisinvoked,andthe resultsareboundto fieldsof a binding,which is returnedastheoverall resultof thepackagebuild.

Eachpackage’s build.vesmodel is easily implementedby simply importing

82

Page 95: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

importself = build.ves;std_env =

/vesta/west.vestasys.org/common/std _env/9 /buil d.ves ;{

// bind the standard environment to ‘.’. = std_env()/env_build("AlphaDU4.0") ; // (1)

// build selected componentsb = self(); // (2)return [ lib = b/lib(), progs = b/progs() ]; // (3)

}

Figure5.5: A typical controlpanelmodel.

separatesub-modelswithin thepackagefor building eachcategory of packagere-sult, andreturninga binding containingthe closurescorrespondingto thoseim-portedmodels.Hence,like controlpanelmodels,build.vesmodelsaresimpleandhighly stylized.Therealwork of building thepackageis deferredto theimportedsub-models,whosecontentswe describein thenext section.

5.4 The Standard Construction Envir onment

If userswererequiredto constructtheir programsusingthelow-level run toolprimitive, their systemmodelswould bequite long andcomplicated.To simplifyandstandardizeusermodels,Vestaprovidesa standard constructionenvironment,acollectionof reusablemodelsfor performingcommonbuilding operations.

This sectiondoesnot describethe workingsof our standardenvironmentdi-rectly. Instead,it presentsexamplesof client modelswritten againstthestandardenvironment,andit describesthemechanismsavailableto clientsfor customizingtheir builds. Our aim hereis not only to convey thegeneralform taken by clientmodels,but alsoto demonstratethelanguage’s power, flexibility, andutility.

We designedthe standardconstructionenvironmentto be flexible andto suitour needs,but differentdevelopmentorganizationsmight chooseto structuretheirconstructionenvironmentdifferently. Writing a new constructionenvironmentisentirelypossible;oursis justoneexample.

83

Page 96: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

5.4.1 Componentsof the Standard Envir onment

Ourstandardconstructionenvironmentconsistsof astd env modelandacollec-tion of bridgemodels.Eachbridgemodelexportsan interfaceto somecollectionof relatedtools. Vesta-2bridgesaremodelswritten in the regular Vestasystemmodelinglanguage,andthey areinterpretedby theVestaevaluatorjust like clientmodels.They arenothard-wiredinto theVestasystemin any way.

Bridge modelsmusthandlethe vagariesof interfacing to legacy tools andofsupportingcustomizedbuilds, so they tend to be much more complicatedthanclient models.Thestd env modelandthesevenbridgemodelscomprisingourstandardconstructionenvironmentareroughly1,800total linesof Vestacode.Ap-proximatelytwo-thirdsof thatcodeis in thebridgesfor C/C++andModula-3pro-grams. Thesebridgesexport functionsfor building programs,pre-built libraries,leaf libraries,andumbrellalibrarieswritten in their respective languages.

5.4.2 Building Library Models

Thestandardconstructionenvironmentemphasizestheorganizationof codeinto ahierarchyof libraries. Although librariesmaybe implementedin differentways,they sharea commonstructure.Eachlibrary is modeledasa setof interfacefilesanda setof files that implementthe interfaces. The format andarrangementofthesefiles dependon both the languagein which the library codeis written (andthuswhichbridgeis usedto modelit) andthelibrary’s placein thehierarchy.

For concreteness,we restrictourselvesfor therestof this sectionto describingthelibrary facilitiesprovidedby theC/C++bridge.Thefacilitiesof theModula-3bridgearesimilar.

Recallthatthestandardconstructionenvironmentsupportstheconstructionofthreekinds of libraries: umbrellalibraries, leaf libraries,andpre-built libraries.The bridge exportsa closurefor building eachkind of library. Sincethe clientmay wish to customizehow a library is built, it is necessaryto defer the actualconstructionof the library until it is needed.Hence,theclosuresfor eachlibrarytype do not actuallybuild anything. Instead,they embedthe library sourcefilesin a binding that servesasa setof instructionsfor constructingthe library. Thelibrary is built with theappropriatecustomizationsonly whenaprogramrequiringit is built.

This approachdiffers sharplyfrom the conventionalone, in which particularversionsof standardlibrariesareplacedin standardlocationsin thefile systemhi-erarchy. TheVestastandardenvironmentplaceshighlyparameterizedconstructionrecipesat thedeveloper’s disposalratherthanparticular, previously built versions.Thedifferencein flexibility anddevelopmenteaseis enormous.

84

Page 97: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

filesc_files = [ Lex.C, Scan.C, Index.C ];h_files = [ Lex.H, Scan.H, Index.H ];priv_h_files = [ IndexRep.H ];

{return ./Cxx/leaf("libMailIndex.a",

c_files, h_files, priv_h_files);}

Figure5.6: Themodelfor building the leaf library of thehypotheticalmail/index package.

To seehow theVestaapproachworksin practice,wefirst look at theorganiza-tion of a leaf library. Figure5.6 shows themodelfor thehypotheticalmail/indexlibrary of Figure5.3; it describeshow to build a library archive namedlibMailIn-dex.a.

In general,amodel’s files clausebindslocalvariablenamesto localfilesanddirectorieswithin thepackage;thepathsin a files clauseareinterpretedrelativeto thedirectorycontainingthemodelitself. In this case,thefirst four linesof themodelintroducethreelocalvariables(c files , h files , andpriv h files )whosevaluesarebindingsthatassociatethe listednameswith thecorrespondinglocal file contents.For example,the local variablepriv h files is assignedabindingthatassociatesthenameIndexRep.H with thecontentsof thatlocalfile.Thusthesourcefilesareseparatedinto threegroups:1) C++ sourcefiles thatmustbecompiledto produceobjectfiles,2) headerfiles thatareprovidedto clients,and3) privateheaderfiles thatarerequiredonly by theimplementation.

The body of the model consistsof a single function call. The function be-ing invoked is named./Cxx/leaf . It is the function namedleaf exportedby the C++ bridge, which in turn is part of the currentenvironment “ . ”. The./Cxx/leaf functionsimplypackagesupthefiles into asinglebindingthatwillserve asthe instructionsthat later will be usedto build the actuallibrary archive(possiblywith developer-definedcustomizations).

Figure5.7showsamodelfor building thehypotheticalmail/lib umbpackage’sumbrellalibrary. Thefirst threelinesof thismodelimport thetwo top-level modelsof theumbrella’s componentlibraries,bindingthosemodelsto thelocal variablesnamedsend rcv andindex . Thebodyof themodelfirst assignsthelocal vari-ablelibs to a list of thelibrariescomprisingtheumbrella.(A list is denotedby acomma-separatedsequenceof valuesenclosedin anglebrackets.)Notethatthelistof librariesalsoincludesthe standardpthreadsand libc libraries. Theselibrariesdo not have to be imported,but insteadareaccessedfrom “ . ”, wherethey were

85

Page 98: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

importsend_rcv =

/vesta/west.vestasys.org/mail/send_rcv /1/bu ild.v es;index =

/vesta/west.vestasys.org/mail/index/3/ build .ves;{

libs = < send_rcv(), index(), ./C/libs/c/pthreads,./C/libs/c/libc >;

return ./Cxx/umbrella("libMailUmb", libs);}

Figure5.7: The model for building the umbrellalibrary of the hypo-theticalmail/lib umbpackage.

installedby the standardenvironment. Including thesestandardlibraries in theumbrellaobviatestheneedfor applicationmodelsto mentionthem. Themodel’sfinal line invokes the C++ bridge’s umbrella function, which packagesup thesuppliedlibrariesin anumbrellawith thesuppliedname.

The library hierarchyprovided by umbrellamodelsgreatlysimplifiesthe de-veloper’s job in specifyingwhich librariesareneededto link a completeprogram,andtheorderin which they mustbe listedon the link line. Ratherthanhaving todetermineandexplicitly specify(in order)thesetof all necessarylibrary archives,asis generallyrequiredin the Unix environment,the developerspecifiesa smallnumberof higher-level umbrellalibraries. As a result,thedeveloper’s modelsareboth simplerandmore robust. The C++ bridge function that performsprogramconstructionhandlesthecomplexities of constructinga suitablecommandline fortheUnix linker by “flattening” theumbrellahierarchy.

The C++ bridge alsodefinesa function for packagingpre-built libraries. Amodelfor building a pre-built library simply invokesthis bridgefunction,supply-ing thelibrary file andits associatedheaderfilesasflat bindings.Thesemodelsaretypically evenshorterthanthemodelsfor building umbrellalibraries.

Now thatwe’ve describedhow to build the threekinds of library models,weturn to theapplicationmodelsthatusethem.

5.4.3 Building Application Models

Themodelfor anapplicationprogramspecifiesthesourceandheaderfiles5 com-prising theprogram,the library packagesto link it against,andany overridesre-

5Theheaderfiles mustbespecifiedsothey canbeaddedto thevirtual file systemin which eachsourcefile is compiled.

86

Page 99: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

quiredto build theprogramin acustomizedmanner.For example,Figure5.8 shows themodelfor building thehypotheticalappli-

cationpackagemail/search. Themodelbeginswith anenumerationof thesourcescomprisingtheapplicationandtheoverridesrequiredto build it. Theactualcon-structionoccurswhen the bridge function ./Cxx/program is invoked. Thisfunction first builds the libraries specifiedby its libs argument,which in thiscaselists a singleumbrellalibrary taken from theenvironment. It thenaugmentsthecurrentenvironmentto includetheheaderfilesin theh files argument,com-piles eachof the files in the c files argument,and links everything together.The value returnedby the function is a singletonbinding that mapsthe namemailsearch to theresultingexecutable.

Notice that thepackageexpectsto get themail umbrellalibrary mail/lib umbfrom its environment(“ . ”). Thisreflectsthestructureshown in Figure5.3,whereareleasemodelimportsparticularversionsof thestandardenvironmentandthemailumbrella,addsthemto “ . ”, andthenimportsandbuilds thetwo mail applicationsin this environment. Onecould alsowrite a controlpanelmodelto build just thesearchapplication;it too would have to chooseandimport particularversionsofthestandardenvironmentandthemail umbrellalibrary.

filesc_files = [ QueryAST.C, ParseQuery.C, Search.C ];h_files = [ QueryAST.H, ParseQuery.H, Search.H ];

{// overrideovs = [ Cxx/switches/link/shared_libs = "-non_shared" ];

// build programlibs = < ./Cxx/libs/mail/lib_umb >;return ./Cxx/program("mailsearch",

c_files, h_files, libs, ovs);}

Figure5.8: Themodelfor building thesearchapplicationof thehypo-theticalmail/search package.

As we describemorefully in thenext section,thestandardconstructionenvi-ronmentprovidesaquitegeneralmechanismfor customizingdifferentpartsof thebuild process.It supportsdifferentcustomizationsfor the compilationof a pro-gram’s sourcefiles andits libraries.For example,thefifth parameterto thepro-gram function,ovs , specifiesoverridesthatapplyto thecompilationandlinkingof only theprogram’s sources.In this case,thegivenvaluefor ovs specifiesthat

87

Page 100: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

theprogramshouldbestaticallylinked(i.e., linkednon shared ).

5.4.4 Customizing Builds

Thestandardconstructionenvironmentincludesseveraldifferentmechanismsforperformingcustomizedbuilds. Theavailablecustomizationsrangefrom buildingagainsta particularversionof an entire packageto compiling a single file of alibrary in a customizedway. In general,the mechanismshave beendesignedsothattheconceptualsizeof a customizationis proportionalto theextra instructionsthatmustbewritten to produceit.

In general,theoverridessupportedby ourstandardenvironmentcanbedividedinto two classes:general overridesandnamedoverrides. A generaloverride isusedto alter the standardenvironmentby addingor replacingbindingsin oneormorelocations.Sincethestandardenvironmentis a naminghierarchy, a generaloverrideisabindingthatdefines(new) valuesfor selectednamesin someportionofthathierarchy. A generaloverrideis appliedby recursively overlayingtheoverrideatsomepoint in thestandardenvironment.A namedoverrideis abinding(possiblycontainingotherbindings)that is interpretedlike a table,with thebindingnamesbeing the keys. Typically, the namesspecify the entitiesto which the overrideapplies,suchasasourcefilenameor a library name.Wewill seeexamplesof bothgeneralandnamedoverridesbelow.

Wenow considerthreeformsof overridingsupportedby ourstandardconstruc-tion environment.

PackageOverrides. Perhapsthemostcommonform of overrideinvolvesoverrid-ing whichversionof apackageto usein abuild. It is notuncommon,for example,for adeveloperto besimultaneouslyworkingonalibrary andaclientapplicationofthatlibrary. In thatcase,theapplicationmodelmustspecifythatthelatestcheckedout versionof thelibrary packageshouldbeused,ratherthantheoneinstalledbythestandardconstructionenvironment.

For packagesimportedby thestd env model,a package override is accom-plishedby passinganamedoverrideto thefunctionfor building thestandardenvi-ronment.Figure5.9 shows anexampleof a controlpanelmodelwith anoverrideto usea checkout versionof the c/libc library package.Comparethis versionofthe modelagainsttheoneshown in Figure5.5 (page83). The only differenceistheextrapkg ovs parameterpassedto theenv build function.This parameterspecifiesthatcheckout version7/1of thec/libc packageshouldbeused.

Build-W ide Overrides. As its nameimplies,a build-wideoverrideappliesto anentirebuild. Suchoverridesareeffectedin the control panelmodelby applying

88

Page 101: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

importself = build.ves;std_env =

/vesta/west.vestasys.org/common/std _env/9 /buil d.ves ;libc =

/vesta/west.vestasys.org/c/libc/che ckout/ 7/1/b uild. ves;{

// bind the standard environment to ‘.’pkg_ovs = [ c/libc = libc ];. = std_env()/env_build("AlphaDU4.0", pkg_ovs);

// build selected componentsb = self();return [ lib = b/lib(), progs = b/progs() ];

}

Figure 5.9: A control panelmodel that overridesthe versionof themail/index packageusedin thebuild.

a generaloverrideto thecurrentenvironmentafter theenvironmenthasbeencon-structed,but beforethepackage’s own build.vesmodelis called.

For example,thefollowing build-wide overridemight bespecifiedin thecon-trol panelmodel:

// build-wide overridecomp_switches = [ debug = "-g3", opt = "-O1" ];. ++= [ Cxx/switches/compile = comp_switches ];

This particularoverridecausesall C++ files to be compiledwith debuggingsymbolsandoptimization.Theoverrideappliesto all programsbuilt by thismodel,aswell asto all of thelibrariesthey import.

Thisexampleshowshow parameterscanbepassedto bridgefunctionsthroughtheenvironment(i.e., “ . ”). Becausetheenvironmentis passedimplicitly oneveryfunctioncall (seeSection5.2.1above),parametersstashedinsideit areavailabletoevery (bridge)function. The interfaceto theC++ bridgespecifieswhich partsoftheenvironmentareaccessedby eachof its functions.

Library Overrides. Thestandardconstructionenvironmentalsoincludesamech-anismfor overridinghow aparticularlibrary is compiled,or evenhow aparticularfile within a library is compiled.Both formsusenamedoverrides,sincethey mustnamethelibrary archive or sourcefile, respectively, to which theoverrideapplies.In thecasethattheoverrideappliesto anumbrellalibrary, theoverridealsoaffectsall descendantsof theumbrella.

89

Page 102: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

As in the caseof build-wide overrides,library overridesarepassedby recur-sively overlaying part of the currentenvironment, typically in the control panelmodel. For example,a control-panelmodelcould causethe libVestaBasics.ali-braryto bebuilt with debuggingsymbolsasfollows:

// library overrides. ++= [ lib_ovs/libVestaBasics.a =

[ ovs/Cxx/switches/compile/debug = "-g" ]];

Becausethe constructionof eachlibrary is delayeduntil it is needed,eachlibrary is built accordingto theoverridesin forceat thetimetheapplicationusingit is built. Hence,differentapplicationsin a releasecanbe built usingdifferentcustomizationsof thesamelibrary.

5.4.5 Handling Lar geScaleSoftware

For Vestato accommodatethe constructionof large scalesoftware, the namingusedby thestandardconstructionenvironmentmustbedesignedto handlea largenumberof namedartifacts.Thestandardconstructionenvironmentaddressesthisproblemby providing varioushierarchicalnamespaces.Therearetwo in particularworthmentioning.

First,thenamesusedto specifypackagesin apackageoverridearehierarchical,usinganamespacethatparallelstherepository’s packagenamespace.Noticethatin thepackageoverrideexampleof Figure5.9 (page89), thepkg ovs parameterbindsthetwo-level namec/libc to thenew versionof thepackage.

Second,theC/C++ bridgeof thestandardconstructionenvironmentprovidesan option for naminglibrariesin library overridesusinghierarchicalnames.Wehave alreadyseenthat umbrellascanbe usedto organizelibrarieshierarchically.So long as the numberof libraries is small, a flat namespacesuffices to namethem. But a flat namespaceis insufficient if the samenameis usedfor two leaflibrariesunderdifferentumbrellas.The bridgehasa modein which librariesarenamedhierarchically, therebyaccommodatingsuchcases.In this mode,a libraryis namedin a library overrideby its path in thelibrary hierarchy.

90

Page 103: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Chapter 6

Function Cache

Conceptually, every Vestabuild is donefrom scratch,to ensureconsistency. How-ever, to achieve goodperformance,Vestarememberstheresultsof previousbuildsandreusesthemwhenever possible,therebymakingall builds incremental.TheVestafunctioncacheserver is thecomponentresponsiblefor storingtheresultsofprevious builds andmakingthemavailablefor reuse.Sincethe samecentralizedfunctioncacheis sharedby all developersatasite,developerssharethebenefitsofeachother’s builds.

Recallthat theVestasystemdescriptionlanguageis functional,andthat invo-cationsof toolslikecompilersandlinkersarerepresentedin theVestalanguagebyfunctioncalls. Thepartial resultsof previous builds canthusbesaved simply bycachingtheargumentsandresultsof eachfunctionapplication.Thefunctioncachestorestheargumentsandresultof eachfunctioncall asacacheentry.

Althoughthefunctioncachewasdesignedparticularlyfor Vestaclients(namely,theevaluatorandweeder),its interfaceis generalenoughto supportotherclients,aswe will seein Section6.3 below. As oneexampleof this generality, thecachekeys storedby the functioncachearearbitraryname/valuepairs. It is theclient’ssoleresponsibilityto computethesekeys correctlysoasto make properuseof thecache.WediscusstheVestaevaluator’s cachingalgorithmin Chapter7.

For thefunctioncacheto beof any useto a client,oneobviousrequirementisthat thetime requiredby theclient to computethenecessaryresultin theeventofa cachemissmust far exceedthe time requiredby the cacheto producea cachehit. This is certainlythecasein Vesta,sincesomefunctions(suchasexternaltoolinvocations)arerelatively long running.

The rest of this chaptermotivatesthe Vestacachingproblem,describestheinterfaceprovided by the function cacheserver, lists somerequirementsthat theservermustsatisfy, andsketchesaspectsof theserver’s implementationthathelpitto meettherequirements.

91

Page 104: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

6.1 Dynamic, Fine-Grained Dependencies

TheVestafunctioncacheisdifferentfromanormalmemorycachein two importantrespects.First, theentriesstoredin thecacheareimmutable,sothecachedoesnothave to supportinvalidation,a considerablesimplification. Second,thecompletecachekey usedto lookupacacheentryisnotdirectlycomputablefromthefunctioncall site.Thispropertysubstantiallycomplicatesthedesign,but it is inherentin anycachingschemein whichthecachekey is formedpartlyfrom dependenciesthatarecomputeddynamically, a topicwe discussbelow.

By using a cachedfunction resultwhenever it is safeto do so, unnecessaryrecompilationsand other work can be avoided. But when is it safe to reuseacachedresult? Only when the evaluationcontext at a candidatecall site agreeswith thosepartsof thecontext on which someprevious call to thesamefunctiondepended.

Whendetectingdependencies,it is of courseessentialnot to omit any, or thecachewouldbeunsound,sometimesreturningincorrectresults.In particular, if thecachedid not recordsufficient dependency information for a particularfunctioncall, a subsequentfunction call might incorrectly return a hit on the previouslycachedresult.We call this situationa falsehit. It is anintolerableoutcomethat isindicative of abug in theclient’s cachingalgorithm.

However, it is alsoimportantnot to err by introducingoverly coarse-graineddependencies,or thecachewill be ineffective, sometimesfailing to returna resultwhen it should. We call this situationa false miss. Falsemissesarisebecausethecachehasrecordedanunnecessarydependency on thecalling context. If thataspectof thecallingcontext is differenton a subsequentcall, thecachewill reportamiss,eventhoughit wouldhave beensafeto usethecachedresult.

Falsemissesdonot leadto incorrectresults,but they do introduceunnecessaryinefficiency. In fact,falsecachemissescanbequitecostlybecausethey cantriggeracascadeof subsequentcachemisses.For example,if onesourcefile is recompiledunnecessarily, thenall subsequentcommandsthatusetheresultingobjectfile (e.g.,building a library thatcontainsthefile andprogramsthatusethelibrary) arelikelyto beunnecessarilyrepeatedaswell, astheobjectfile will appearto havechanged.Hence,we go to greatlengthsto avoid falsecachemisses.

For thecacheto bemosteffective, eachfunctioncall’s dependenciesmustberecordedaspreciselyaspossible.Forexample,considerthefollowing simplefunc-tion:

f(x, y, z) {return (if x > 0 then y else z);

}

92

Page 105: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Becauseof the conditionalexpression,the argumentson which this functiondependsvarydynamicallyfrom call to call. For example,in thecall f � 1 � 2 � 3� , theresultdependsonly on thevaluesof x and y; thevalueof z is irrelevant. Hence,thesubsequentinvocation f � 1 � 2 � 7� in which thevaluesof x and y areidenticalshouldproducea cachehit on thefirst call.

In thisparticularexample,theobservant readerwill noticethattheexactvalueof x is also unimportant. What mattersis simply whetheror not x is positive.Hence,to get the most accuratecaching,dependenciesshouldtake the form ofpredicateson values,not the exact valuesthemselves. For now, we will assumethat all dependenciesare recordedon values,but we describehow to representmoregeneraldependenciesin thenext chapter.

The cachingproblemis further complicatedby compositevalue types,suchaslists andbindings.For example,considera slight modificationto our previousexample,in which the y argumentis a binding:

f(x, y, z) {return (if x > 0 then y/a else z);

}

In this case,the resultof thecall f � 1 � [a � 2 � b � 5] � 3� dependsonly on xandy� a. Thesubsequentcall f � 1 � [a � 2 � b � 9] � 7� shouldproducea cachehit.Recordingadependency ontheentirebindingy wouldcausethesecondcall to geta falsecachemiss. This exampledemonstratesthat the dependenciescalculatedwith respectto compositevaluesshouldbeasfine-grainedaspossible.

From thesetwo examples,we concludethat the Vestaevaluatormust com-putefine-graineddependenciesdynamically(that is, during an evaluation). Thechallengesin accuratelyimplementinga cachingschemebasedon dynamic,fine-graineddependenciesaretwo-fold. First, algorithmsmustbe developedfor rep-resenting,computing,andpropagatingdynamic,fine-graineddependencies.Wesketchthetechniqueusedby theVestaevaluatorin Chapter7. Second,dueto thedynamicnatureof the dependencies,it is impossibleto computea singlecachekey at a functioncall sitebeforethefunctionhasbeenevaluated.We discussthisproblemnext.

6.2 The CachingProblem

To understandtheproblemintroducedby dynamicdependencies,considerthepro-totypicalcaseof cachinga compilation.Assumethata compilationis invoked byacall to thefollowing function:

compile( filename, options,.);

93

Page 106: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Here,filenameis the nameof the file to compile,options is a binding denotingthe command-lineoptionsto be passedto the compiler, and “.” is the currentenvironment.1

The currentenvironment“.” is a binding valuethat includesa representationof afile systemdirectorytree.Thisdirectorytreecontainsall thefilesnecessarytoperformthecompilation.As describedin Chapter5, theVestalanguagemakesiteasyto constructandextendbindings,soa customfile systemcanbeconstructedquitecheaplyfor eachbuild.

Figure6.1 shows a sampleenvironment. Two pathsin “.” arespecial:rootandroot/.WD . As describedin Chapter3, externaltoolssuchascompilersandlinkersarerun in anencapsulatedenvironmentin which all referencesto files aretrappedby the repositoryandservicedby the evaluator. To servicea file requestfrom the repository, the evaluator looks up the file in the currentenvironment:absolutepathnamesarelooked up in the root subtree,while relative pathnamesarelookedup in theroot/.WD subtree.

.

.WD

root

usr

defs.h hello.c include lib

stdio.h libc.a

Figure6.1: Thefile systemdirectorytreesof asampleenvironment.

Now considerthefollowing invocationof thecompilefunction:

compile("hello.c ", [debug = "-g"], .);

Let “.” be thebinding shown in Figure6.1. Wheninvoked, this function returnsthe singletonbinding that mapsthe name“hello.o ” to the derived objectfileproducedby compilingthefile boundto ./root/.WD/hello .c .

To effectively cachethiscall, theevaluatorwill computedynamic,fine-graineddependencies.This function invocationmight be found to dependon the defini-tion of the compilefunction itself, the valuesof the first two arguments,andthefollowing partsof theenvironment:

1As describedin Section5.2.1,thefinal “.” parametercanbeomitted,in which caseit is passedimplicitly.

94

Page 107: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

./root/usr/lib/ cmpl rs /cc

./root/.WD/hell o. c

./root/usr/incl ude/ st dio .h

Notice that even the referenceto the compiler (cc) is trappedandrecordedasadependency. Thecacheentry for thecompilationcombinesall thesevaluesinto acachekey, andassociatesthatkey with theresultingvalue.

It is worthnotingherewhatwemeanby “combining” valuestogetherto form acachekey. Sincethevaluesusedto form thecachekey canbequitelarge,theevalu-atorandfunctioncacheusefingerprintsof thevaluesinsteadof valuesthemselves.A fingerprint is a small, fixed-sizehashof an arbitrary byte sequence[10, 42].Fingerprintscomewith a mathematicalguaranteeboundingthe probability of acollision; by choosinglong enoughfingerprints,theprobabilityof a collision canbe madevanishinglysmall.2 As a result,fingerprintscanbe usedasa basisforequalitytests,sincewe cansafelyassumethatFP � a��� FP � b����� a � b. Twooperationssupportedon fingerprintsareextendingafingerprintby morebytesandextendingafingerprintby anotherfingerprint.In thelattercase,wewrite fp1 fp2to denotetheresultof extendingfp1 by fp2. The operationis non-commutative.

Fingerprintshave the advantagethat they are small and can be easily com-binedwithout forfeiting their probabilisticguarantee.Storingjust thefingerprintsof valuescontributing to thecachekey is sufficientbecausetheonly operationsthefunctioncacheneedsto performon suchvaluesarecombiningthemandcompar-ing themfor equality, bothof which canbedonejust aswell on fingerprintsasonthefull values.Resultvalues,of course,mustbestoredin full sothat they canbesuppliedto theevaluatorin theeventof acachehit.

We now cometo the main cachingproblemin Vesta: how cana lookup beperformedon thecachewhenall of thedependenciesarenot known until afterthefunction hasbeenevaluated?Given the useof dynamicdependencies,it wouldseemthatwhentheevaluatorreachesa new functioncall site,it mustevaluatethefunction beforeit will have the necessarykey to look it up in the cache! Obvi-ously the cachewould thenbe useless.Alternatively, the evaluatorcould searchlinearly throughtheentirecache,looking at eachentry’s dependenciesandcheck-ing whethertheirvaluesat thecall sitematchtheentry. A cachedesignedthiswaywouldbetooslow to beuseful.

Thesolutionto this chicken-and-egg problemlies in separatingthecachekeyinto two parts,primary andsecondary. Lookupthenbecomesa two-stepprocess.First, theevaluatorcomputestheprimarykey in a fixedway solely from informa-

2For safety, Vestauses128-bitfingerprints.Using thenumbersin Section3.3andotherconser-vative estimates,we computethattheprobabilityof a collision occurringover theexpectedlifetimeof theVestasystemis muchlessthan2! 42.

95

Page 108: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

tion availableat thecall site.A cachelookupusingthisprimarykey yieldsasmallnumberof candidatecacheentries.Next, thesecondarykey of eachcandidateentryis checkedto seeif it matchesthevaluesat thecall site.

In more detail, we divide the dependenciesinto two groups: thosethat areknown staticallyat thefunctioncall site,whichwecall primarydependencies, andthosethatareonly known dynamicallywhenthe function is evaluated,which wecall secondarydependencies.3 Sinceprimarydependenciesareknown at thecallsite,they canbeusedto computetheprimarykey andreducethenumberof cacheentriesthatmustbeconsideredduringacachelookupoperation.

Theprimarykey (PK) is formedby combiningthefingerprintsof theprimarydependency values. Eachsecondarydependency consistsof a nameandthe fin-gerprintof thecorrespondingvalue.Together, thesesecondarydependency namesandfingerprintsform thecacheentry’s secondarykey (SK). Overall, then,a cacheentryis a triple of thefollowing form:

"primary-key, secondary-key, result-value #

Here,theprimarykey is a singlefingerprint,thesecondarykey is a setof (name,fingerprint)pairs,andtheresultvalueis thefunction’s full resultvalue,suitableforuseby theevaluatorin theeventof acachehit.

Figure6.2 shows the primary andsecondarykeys computedfor the examplecompilationabove. First, the fingerprint Q of the compilefunction itself (a clo-surevalue)and the fingerprintsR and S of the first two function argumentsarecomputed.Thesefingerprintsarethencombinedto form a new fingerprint A, theprimarykey. Whenthefunctionis evaluated,referencesto “.” aretrappedandthenamesandfingerprintsB, C, and D of the correspondingvaluesarerecordedassecondarydependencies.Thecacheentry formedfor this functionevaluationis atriple consistingof theprimarykey, secondarydependency namesandfingerprints,andtheresultvalueof theevaluation.

It is commonfor multiplecacheentriesto havethesameprimarykey. In partic-ular, thisoccurswhenever asourcefile is editedandrecompiled.Figure6.3showsan example. In that figure, the two columnsof fingerprintson the right denotetwo differentcacheentries.Both cacheentriescorrespondto thecompilationof afile named“hello.c ” with thesamecompilationswitches,sobothentrieshavethesameprimarykey A. However, betweenthe two compilations,thesourcefile“hello.c ” hasbeenedited,so the fingerprint for the correspondingsecondarydependency haschangedfrom C to E. Sincethesecondarydependenciesfor thetwo evaluationsaredifferent,two differententriesarestoredin thecache.

3As describedin Section7.4.1,theVestaevaluatorusesheuristicsanduser-suppliedpragmastodistinguishprimaryfrom secondarydependencieswhencachingcallsto user-definedfunctions.

96

Page 109: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

compile ( "hello.c" , [debug = "−g"] , . );

Q R S

APK

./root/usr/lib/cmplrs/cc

./root/.WD/hello.c

./root/usr/include/stdio.h

B

C

D

SK

Figure6.2: Theprimarykey (PK) andsecondarykey (SK) of a singlecacheentry.

Primary Key (PK)

0 1

./root/usr/lib/cmplrs/cc

./root/.WD/hello.c

./root/usr/include/stdio.h

A A

B B

C

D

E

D

Entries

SK

Figure6.3: Two cacheentrieswith thesameprimarykey (PK).

An importantconsequenceof usingdynamicfine-graineddependenciesis thatthesetof secondarydependency namesmaydiffer from onecacheentryto thenext,evenamongentrieswith thesameprimarykey. An exampleis shown in Figure6.4,wherecacheentries1 and2 have differentsetsof secondarydependency namesbecausethefile “hello.c ” waseditedto includeafile named“defs.h ” insteadof “stdio.h ”. Thedifferencebetweenentries2 and3 is thatthefile “defs.h ”waschanged.

Figure6.4alsodemonstratesanimportantpropertyof theVestacachingstrat-egy. For Vestacachingto becorrect,all user-definedfunctionsandexternaltoolsmustbe functionalanddeterministic:thesame(possiblydynamic)inputsshouldalwaysproducethe sameoutput4. As a result,cacheentrieslike the onelabeled

4This requirementis statedmorestrictly thanis actuallynecessary. For example,somecompilersembeda timestampin theobjectfiles they generate,sostrictly speaking,thecontentsof generated

97

Page 110: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Primary Key (PK)

0 1

./root/usr/lib/cmplrs/cc

./root/.WD/hello.c

./root/usr/include/stdio.h

A A

B B

C

D

E

D

Entries

2 3 X

A A A

./root/.WD/defs.h

B B B

F

G

F

H

F

J

H

SK

Figure6.4: Multiple cacheentrieswith thesameprimarykey (PK), butwith differingsetsof secondarydependency names.

“X” in the figure shouldnot be created.The fingerprintsof cacheentry X agreewith thoseof cacheentry3 for thosesecondarydependenciessharedby thetwo en-tries,but entryX hasanadditionalsecondarydependency on thefile “stdio.h ”.Functionaltools like compilersnever producesuchentries,andneitherdo user-definedfunctionswritten in theVestalanguage.

6.3 Interface

Having describedprimary andsecondarydependencies,we arenow preparedtopresentthetwo-stepcachelookupprocessin detail.Figure6.4helpsillustratehowlookupsin thecacheareperformed.Considerthe functioncall from our runningexample:

compile("hello.c ", [debug = "-g"], .);

To testwhetherthis function call producesa hit in the cache,we first computethe primary key from the call site asshown in Figure6.2. We thenconsiderallcacheentrieswith thecomputedprimarykey. For eachsuchentry, wecomparethesecondarydependency fingerprintsagainstthe fingerprintsof the correspondingvaluesin thecurrentevaluationcontext. If thesecondarydependency fingerprintsof any cacheentry matchthoseof the evaluationcontext, we have found a hit.Otherwise,we reportamiss.

Theprotocolfor doinga cachelookupis thusa two-stepprocess.Oneround-trip to thefunctioncacheis requiredfor eachstep.

1. In the first step,the evaluatorcomputesthe primary key from the functioncall site,andcommunicatesit to thefunctioncache.In response,thecache

objectfiles dependon whenthecompileris run. However, suchembeddedtimestampsaresemanti-cally irrelevantto theuseof theobjectfiles,soit is safeto treatthecompilerasa functionaltool.

98

Page 111: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

returnstheunionof all secondarydependency namesassociatedwith cacheentrieshaving thedesignatedprimarykey. In our runningexample,thesec-ondarydependency namesarethefour pathnamesshown in Figure6.4.

2. In the secondstep,the evaluatorcomputes(the fingerprintsof) the valuesassociatedwith thosesecondarydependency namesin thecurrentevaluationcontext, andsendsthemto thefunctioncache.Thefunctioncachethencom-paresthosefingerprintsto eachof the entrieswith the designatedprimarykey, looking for a matchbetweenthe fingerprintsfrom the evaluationcon-text and thosein somecacheentry. In our runningexample,the receivedfingerprintswouldbecomparedto thecolumnsof Figure6.4; if any columnmatches,acachehit is reported.

Betweenthesetwo steps,anotherclient mayhave addeda new entrythat includessomenew secondarydependency names.To avoid reportinga falsemisson thenew entry, the cacheusesan optimistic concurrency control scheme:the secondstepmay returna result indicatingthat thenamesreturnedfrom thefirst steparestale,in which casetheclient is expectedto restartthe lookup processfrom stepone. This schemeeliminatesthe needto lock thecacheagainstupdatesduring alookupoperation.

In theeventof acachemiss,theevaluatorwill proceedto executethefunction,recordingdynamicfine-graineddependenciesalongtheway. It will thenform theprimary key, secondarykey, and result value for a new cacheentry, and call amethodof thefunctioncacheinterfacefor addingthisnew entryto thecache.

In additionto the main interfacethat supportsperformingcachelookupsandaddingnew entriesto thecache,thefunctioncachealsoexportsaninterfaceusedsolelyby theVestaweeder. This interfaceallows theweederto indicatewhenit isbeginningits work,andwhenit hasdeterminedthefull setof cacheentriesto keep.Betweenthesetwo calls,thecachedisablestheexpirationof leasesoncacheentries(describedbelow) to preventnewer entriesfrom beinginadvertentlydeleted.

6.4 SystemRequirements

In additionto implementingtheinterfacesdescribedin theprevioussection,thereareseveralotherrequirementson thefunctioncache.Thissectionhighlightssomeof thefunctioncache’s mainsystemrequirements.

$ It muststorecacheentriespersistently;properlyshuttingdown andrestartingtheservermustnotcauseany cacheentriesto belost. Persistenceis achievedby storingoldercacheentriesin diskfilesandusingacombinationof logging

99

Page 112: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

andcheckpointingfor newer cacheentries. For efficiency, log entriesarekept in memory;the interfaceto the function cacheincludesa methodforsynchronouslyflushingall pendinglog entriesto disk.

$ It mustbefault-tolerant.In theeventof acrashor otherfailure,thefunctioncachemustbeableto recover in a consistentstate.In particular, thevariouslog files andcacheentry files must be flushedin an order that allows therecovery algorithmto toleratefailure at any time. Although somerecentlycreatedcacheentriesmaybe lost in a crash,only a relatively smallamountof work hasto berepeatedto recreatethem.

$ It mustsupportfastlookup. Sincedisk readsdominatethetime requiredtodo a lookup,theformatof cacheentrieson disk hasbeendesignedsomostlookupscanbeperformedwith atmosttwo disk reads.Many lookupsresultin hits on in-memorycacheentries,which canbeservicedwithout any diskreads.

$ It mustservicemultiple clientsconcurrently. Theserver is thread-safe,andit includesfine-grainedlocksto avoid excessive lock contention.

$ It mustsupportconcurrentweeding(thedeletionof unwantedderived filesandcacheentries)withoutaffectingclientsadversely. Supportingconcurrentweedingis essentialbecausethe weedercan take a long time to run, andbecausedeletingcacheentriesis alsotime consuming.Oncewe allow forconcurrentweeding,animmediateproblemarises:thesetof cacheentriestokeepwill becomputedby theweederbasedonsome“snapshot”of thecache,soanextramechanismis necessaryto protectall cacheentriescreatedsincethesnapshotwastaken.

$ It mustbe tolerantof client errorsandclient failures. For the former, thefunctioncacheinterfaceincludesrun-timechecksof its argumentswith pro-visionsfor an error returncode. For the latter, the function cacheprotectsnewly createdandrecentlylooked-upcacheentriesfrom weedingby leases.A leaseis a “fresh” bit thatprotectsa cacheentry from weeding,but leasesexpire aftera fixedamountof time. Hence,a cacheentrycreatedby a clientthat crashescaneventuallybeweeded,sincethe leaseprotectingthecacheentrywill eventuallytimeout.

$ It mustscale.In particular, thefunctioncachehasbeendesignedto storeupto tensof millions of cacheentries.To accommodatethatnumberof cacheentries,a two-level memoryhierarchyis used.New andrecentlylooked-upcacheentriesarecachedin memory, while othersarestoredonly on disk.

100

Page 113: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Within eachdisk file, cacheentriesarestoredin a three-level hierarchyde-scribedbelow.

Theserequirementssubstantiallycomplicatethefunctioncache’s implementa-tion. It would betediousto go into completedetailhereon everythingthe imple-mentationdoesto meettherequirements.Instead,thenext sectiondescribesonlythe most importantaspectsof the implementation,focusingon information thatwill beneededasbackgroundto understandtheperformancemeasurementsgivenin Chapter9.

6.5 Implementation

Becausestep1 of cachelookup narrows the searchdown to entriesthat matcha givenprimary key, it is mostefficient for thecacheserver to organizeits cacheentrystorageby primarykey. Bothin memoryandondisk,theserverstoresentriesin groupscalledPKFiles. EachPKFile holdsall theentriesthathave a particularprimarykey.

If thecachelookupalgorithmworkedexactlyasoutlinedin Section6.3,its sec-ondstepwouldhave to searchthroughall thecacheentriesin therelevantPKFile.For eachsuchentry, it wouldhave to performafingerprintcomparisonfor eachoftheentry’s secondarydependencies.Both thenumberof cacheentriesin a PKFileandthenumberof secondarydependenciesin a cacheentrycanbequitelarge,ontheorderof hundreds.Hence,if this naive lookup algorithmwereused,lookupswould requiretensof thousandsof fingerprintcomparisonsin theworstcase,andhence,wouldbequiteexpensive.

To avoid this problem,cacheentriesareorganizedin a multi-level hierarchy.First, all entrieswith thesameprimarykey aregroupedtogether. Thentheentriesin eachgrouparepartitionedin suchawaythatonly asubsetof theentriesin eachgroupneedbeexaminedon any lookup.

To explain how this partitioningis done,we introducesomenotation.For anycacheentrye, let e% pkdenotee’s primarykey, let e% names denotethesetof namesin e’s secondarykey, and for any namen & e% names, let e%(' al ) n* denotethefingerprintvalueassociatedwith n in e’ssecondarykey. Now definethefollowing:

Entries) pk*,+ - e . e% pk + pk /AllNames) pk*,+

e0 Entries1 pk2e% names

CommonNames) pk*,+e0 Entries1 pk2

e% names

101

Page 114: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

CFP) e*3+n0 CommonNames1 e.pk2

e%(' al ) n*

ThesetCommonNames) pk* thusconsistsof thosenamesthatoccurin everycacheentrywith thegivenprimarykey. Thenamesin AllNames) pk*54 CommonNames) pk*occurin somecacheentrieswith thegivenprimarykey but notothers;wecall themuncommon. The valueCFP) e* , e’s commonfingerprint, is the resultof combin-ing thefingerprintsof all secondaryvaluesof e correspondingto e% pk’s commonnames.Due to thenon-commutative natureof the 6 operation,it is importanttoenumeratethe namesn in a well-definedorder. To this end, the function cachemaintainsa canonicalorderingfor eachpk of thenamesin AllNames) pk* ; it usesthatorderingwhencomputingCFP) e* .

Giventhesedefinitions,we cannow describehow thecacheimplementsbothstepsof the lookup algorithm. For eachprimary key pk, the cachemaintainsEntries) pk* , AllNames) pk* , and CommonNames) pk* (the last of which is repre-sentedby a bit vectorwith respectto AllNames) pk* ). In step1 of the lookup al-gorithm,thecachesimply returnsAllNames) pk* for thesuppliedprimarykey. Toefficiently performstep2 of thelookupalgorithm,thecachemaintainsCFP) e* forevery entry e. It thengroupsthe entriesEntries) pk* into equivalenceclassesac-cordingto their commonfingerprints.For example,Figure6.5shows thecommonnamesandcommonfingerprintsfor thecacheentriesof Figure6.4.

Primary Key (PK)

0 1

./root/usr/lib/cmplrs/cc

./root/.WD/hello.c

./root/usr/include/stdio.h

A A

B B

C

D

E

D

Entries2 3

A A

./root/.WD/defs.h

B B

F

G

F

H

Common Fingerprint (CFP) X Y Z Z

CommonNames

UncommonNames

Figure6.5: Thecommonnamesandcommonfingerprintsfor thecacheentriesof Figure6.4.

ThePKFile is thenpartitionedby CFPvalue;two entriesarein thesamepar-tition if andonly if they have the sameCFP. Figure6.6 shows the hierarchythatresultsfrom the cacheentriesandcommonfingerprintsshown in Figure6.5. Asshown in thisfigure,multipleentriesappearin thesameCFPgrouponly whenoneof theuncommonsecondarydependencieshaschanged.

102

Page 115: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

A

X Y Z

0 1 2 3

Cache

PK

CFP

Entries

Figure6.6: The hierarchicaldivision of cacheentriesfirst by primarykey (PK) andthenby commonfingerprint(CFP)for thecacheentriesofFigure6.5.

In step 2 of the lookup operation,the cacheis given a pk and the finger-prints f1 7 f2 7 %�%�% 7 fk of the valuescorrespondingto the namesAllNames) pk* atthe call site. Call thesefingerprints fi the call site fingerprints. To performthe lookup, the cachefirst combinesthosecall site fingerprintsassociatedwithCommonNames) pk* , therebyproducingacommonfingerprintcfp. Next, thecachedoesahashtablelookupto seeif pkhascfpasoneof its associatedcommonfinger-prints. If not, thecachereportsa miss. Otherwise,thecacheexaminestheentriesin theidentifiedCFPgroup.In testingfor a hit, only thefingerprintvaluesassoci-atedwith theuncommonnamesneedbeexamined,sinceby virtue of beingin thecorrectCFPgroup,all entriesbeingconsideredareknown to havematchingvaluesfor thecommonnames.

Theperformanceof this lookupalgorithmdependson thedistribution of CFPswithin a PKFile andon thePKFile’s relative numberof commonanduncommonnames. In the caseof a compilation,the file beingcompiledwill be oneof thecommonnames. Hence,eachtime a new versionof a file is compiled,a newcommonfingerprint is produced.The expectation,then, is that within a PKFile,therewill bemany CFPs,but eachCFPgroupingwill containrelatively few entries.Thismeansthatthenumberof cacheentriesconsideredduringalookupis expectedto besmall.

In addition,the lookupalgorithmrequiresfewer fingerprintcomparisons,andthereforeperformsbetter, thesmallerthenumberof uncommonnamespercacheentry. In thestyleof usagewe envisagedwhendesigningthecache,we expectednearlyall PKFilesto have few uncommonnamescomparedto thenumberof com-monnames.In thecompilationexamplewehavebeenconsidering,thisexpectationis not unrealistic,sincethesetof files readduring thecompilationof a particularsourcefile doesnot tendto changemuchfrom versionto version.Thishasprovenoutsofar in practice,asshown by thedistribution of PKs,CFPs,andcommonand

103

Page 116: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

uncommonnamesin anactualcachereportedin Section9.4.2.In summarythen,this lookupalgorithmresultsin two majorcostsavingscom-

paredto thebrute-forcealgorithm. First, only thoseentriesin the identifiedCFPgroupneedbe examined. Second,whenexamining the entriesin a CFPgroup,only the valuesassociatedwith the uncommonnamesneedbe examined. Thesetwo effectsthusdrasticallyreducethenumberof fingerprintcomparisonsrequiredto performacachelookup.

The function cachewasdesignedto storeon the orderof tensof millions ofcacheentries. Hence,storingall of the cacheentriesin memoryis impractical.Instead,a two-level memoryhierarchyis used. Most of the entriesarestoredinstablePKFilesondisk,andthenewly createdcacheentriesandthosecacheentriesonwhichahit hasrecentlyoccurredarestoredin volatilePKFilesin memory. Sta-blePKFilesarestoredin anormalfile systemprovidedby theunderlyingoperatingsystem.

To avoid thedisk fragmentationthatwould otherwiseoccurfrom creatingonedisk file perstablePKFile, thestablePKFilesaregroupedtogetherinto so-calledMultiPKFilesondisk. Two PKFilesarestoredin thesameMultiPKFile if andonlyif theircorrespondingprimarykeyshavethesame16-bitprefix.5 Sincetheprimarykeys areessentiallyrandom,thePKFilestendto beevenly distributedamongtheMultiPKFiles.

To look up a cacheentry, the cachelocatesthe appropriatevolatile PKFile,andfirst looks for the entry in memory. If a hit is not found there,it openstheappropriatestableMultiPKFile, andseeksto the startof the appropriatePKFile.ThestablePKFile headercontainsa tablemappingcommonfingerprintvaluestopositionsin thePKFilewheretheentrieswith eachCFParestored.TheCFPtableis storedin sortedorderby CFPsointerpolatedbinarysearchcanbeusedfor CFPlookup.

Most cachelookupsthat go to disk requiretwo disk reads: one to readtheCFPtableandoneto readthe entry in the event of a hit. In the typical caseinwhich no matchingCFPis found, only onedisk readis required. Dependingonthefilesystemblock size,thefilesystemread-aheadalgorithm,andthenumberofCFPsin thePKFile,somecachehits canactuallybeservicedin asingledisk read.However, morethantwo disk readsmay be requiredto servicea cachehit if theCFPtableor the cacheentry beingreadis larger than the filesystemblock size.Section9.4.1reportsthe time requiredto performcachelookup operationswithvariousoutcomes.

Maintainingthis cacheorganizationis complicatedby theadditionor removalof cacheentries.Whenanew entryisaddedtoor removedfrom thesetEntries) pk* ,

5Thisprefixsizeis acompile-timeconstant,andcanbeeasilychanged.

104

Page 117: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

thesetsAllNames) pk* andCommonNames) pk* canchange.Sincechangesto thelatterwould requirecommonfingerprintsto berecomputedandcacheentriesto bereorganized,newly addedentriesarekept in two sidebuffers (onefor entriesthathave all of thecommonnames,andonefor thosethatdo not). The lookupalgo-rithmmustconsultthesesidebuffersin additiontocheckingfor ahit in Entries) pk* .Oncea large enoughnumberof new entriesarecollected,they aremerged intoEntries) pk* together, therebyamortizingthe high costof recomputingall of thecommonfingerprintsandshuffling cacheentriesamongCFPgroups. Deletionstriggeredby the weederare handledsimilarly; again,new commonnamesandcommonfingerprintsarenot computedfor every deletion,but only oncefor eachPKFileasabatch.

The function cacheusesmutexes to protect its shareddatastructures. Onecentralizedmutex protectsthemaincachevariables.To reducelock contention,aseparatemutex protectsaccessto thecacheentriesandotherstateof eachvolatilePKFile. The useof a separatemutex for eachvolatile PKFile allows lookupsondifferentprimarykeysto proceedin parallel.WhenaPKFile is rewritten,its mutexisheldonlyaslongasnecessary;for example,themutex isnotheldwhile thestablePKFile is readandwhile its commonnamesandthe commonfingerprintsof itsentriesarerecomputed.This lockingstrategy allowsmostof theconsiderableworkrequiredin rewriting aPKFile to occurin parallelwith lookupsandtheadditionofnew entries,therebyimproving thecache’s overall performance.

105

Page 118: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Chapter 7

Evaluator

TheVestaevaluatorinterpretssystemmodelswritten in theVestasystemdescrip-tion language.It interactswith thefunctioncachedescribedin Chapter6 to achievehigherperformancethroughincrementalbuilding. Writing an interpreterfor theVestalanguageis straightforward, but exploiting the functioncacheeffectively isnot. This sectiondescribestheevaluator’s cachingstrategy, aswell asits solutionsto somepracticalproblemsthatarisedueto caching.

7.1 Overview

In a typicalVestabuild, theexpensiveoperationsareapplicationsof thelanguage’srun tool primitive,whichcauseexternaltoolsto beinvoked.Hence,effectively

caching run tool invocationsis crucial to achieving good incrementalbuildperformance.The relatively high costof invoking external tools meansthat theevaluatorcanafford to spendtime computingfine-graineddependenciesin orderto increasethecachehit rateon suchcalls.

Although caching run tool calls is a goodstart,suchcachingalonedoesnot scalewell. As discussedin Chapter3, it is importantfor the runningtime ofan incrementalbuild to be proportionalto the sizeof the change,not to the sizeof the systembeingbuilt. Building a large software systemfrom scratchmightentailthousandsor tensof thousandsof tool invocations.If weassumethatacachelookuptakes10–20milliseconds,anincrementalbuild usingonly run tool callcachingwould take tensof secondsto rebuild sucha system,evenif it werecom-pletelyup to date.In fact,this is preciselythebehavior exhibitedby Make,whose“cachelookups” arefile timestampcomparisonson the input andoutputfiles oftool executions. The time requiredfor theselookupsincreaseslinearly with thenumberof files involved in a build, which is the primary reasonMake doesnot

106

Page 119: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

scalewell.Therefore,it is importantto cachepartsof thebuild larger thanindividual tool

invocations.Vesta’sfunctionalmodelinglanguageprovidesanaturalbasisfor suchcaching.Eachfunctioncall in a build’s call graphcanbecached.During a build,whenever the evaluatorencountersa function call, it consultsthe function cacheto seeif the samefunction haspreviously beenevaluatedon sufficiently similararguments.If so, theevaluationof that functionand all of its descendantsin thecall graph is skipped.Becausethelanguageis functional,thereareno sideeffectsto be reproduced;theevaluatorsimply takestheprevious call’s resultvaluefromthecacheandusesit astheresultof thenew call.1

As describedin Section6.1, using dependenciesthat are too coarse-grainedcanleadto costly falsecachemisses.To avoid falsecachemisses,the evaluatorattemptsto computedependencieson only thosepartsof the evaluationcontextthatarerelevantto thefunctionbeingevaluated.Thesedependenciesarecomputeddynamically, thatis, astheevaluatoris interpretinga functioncall.

A dependency canbe thoughtof asa predicateon theevaluationstatethat isrequiredto produceacachehit. Whenviewedin thisway, adependency thatis toocoarse-grainedcorrespondsto a predicatethat is unnecessarilystrong. The goalof theevaluatoris to recorddependenciesthatareasweakaspossible,but strongenoughto capturethe relevant aspectsof thestatesoasto producecorrectcachehits in thefuture.As describedearlier, we call suchdependenciesfine-grained.

Thedependenciesrecordedby theevaluatorarenot alwaystheweakestpossi-ble predicates.Sometimes,unnecessarilystrongdependenciesareusedintention-ally becausethey aremorecompactandthereforeeasierto manipulateandcheck.Thereis atradeoff betweenthestrengthof thedependency andthecostof gettingacachemiss.In practice,wehave balancedthetradeoff sothatVesta’s performancerarelysuffersdueto unnecessarilystrongdependencies.In any event,theevaluatornever recordsdependenciesthataretoo weak. If it did, falsecachehits would bepossible,andincorrectbuilds would result.

Whencaching run tool calls, the evaluatorrecordsthe tool’s file systemreferencesasfine-graineddependencies.Several different typesof dependenciesarerecorded,eachcorrespondingto a differentkind of file systemoperation,andeachrepresentinga differentpredicateon theevaluationstate.For example,it isessentialfor correctnessthat all attemptsto opennon-existentfiles arerecorded.The operationof listing a directory is anotherkind of dependency that must berecorded.Eachof thesediffersfrom thedependency createdby successfullyopen-

1The language’s primitive functionsother than run tool are inexpensive, and thereforearenot cached.By default, all othercalls arecached.However, the languagealsoincludesa stylizedcomment,or pragma, to suppresscachingof particularfunctions,which is occasionallyuseful inbuild performancetuning.

107

Page 120: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

ing andreadinga file.Cachinginvocationsof user-definedfunctions(i.e., closures)is morecomplex

thancachinginvocationsof tools. More kindsof operationsareavailableon datastructuresin the languagethanareavailableon anencapsulatedfile system,mak-ing it desirableto usemoreprecisedependency predicates.In fact,themainchal-lengein implementingthe Vestasystemdescriptionlanguagelies in computingfine-graineddependenciesfor the language’s many operations,in order to makeeffective useof thefunctioncache.

Althoughcachinguser-definedfunctionsleadstosignificantperformancegains,it doesnot solve the scalingproblemcompletely. When one function calls an-other, mostdependenciesof thecalleetypically becomedependenciesof thecaller.Hence,if nospecialmeasuresweretaken,theroot functionof anevaluationwouldhave anextremelylargenumberof dependencies.Obviously, thatsituationcouldprevent Vestafrom scalingwell. To prevent too many dependenciesfrom beingpropagatedup to the root of an evaluation,we exploit the specialpropertiesofmodels(asdistinct from generalclosures)to allow for anefficient, coarse-grainedrepresentationof thedependenciesassociatedwith modelevaluations.Modelsthusserveascutoff pointsin thecall graph,beyondwhichoverly fine-grainedandbulkydependenciesdonotpropagate.Weexplain thismatterfully in Section7.5below.

The restof this chapteris organizedasfollows. The next sectionbriefly de-scribeshow the evaluatorrepresentsdependencies.The threesectionsafter thatdescribethe evaluator’s algorithmfor creatingcacheentriesfor the threeclassesof functionsenumeratedabove, namely, the run tool primitive, user-definedfunctions,andmodels.Section7.6thenshows theeffectof theevaluator’s cachingtechniqueson real builds. The final sectiondescribeshow the evaluatorhandleserrors.

7.2 RepresentingDependencies

As mentionedearlier, dependenciesin generalare predicateson the evaluationcontext. In a practicalimplementation,however, allowing for arbitrarypredicateswould becostly in bothspaceandtime. We thereforeusea small,fixedcollectionof predicates,encodedasdependencypaths. It is thesedependency pathsthatarepassedto thefunctioncacheasthenamesin acacheentry’s secondarykey.

Thesyntaxof adependency pathis givenby thefollowing grammar:

dpath ::= t :patht ::= V . X . D . T . L . E

path ::= 89. id/path

108

Page 121: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

A dependency pathtakesthe form t : path, wheret denotesthedependency type,andpathspecifiesthecomponentof theevaluationcontext onwhichtheevaluationdepends.We use 8 to denotean empty path; a path of the form id/ 8 or 8 /id isequivalentto thepathid.

The typeis a single-charactercodethatselectsa predicateon thepath’s valuethat mustbe satisfiedfor the dependency to match;we describethe meaningsofthe six dependency typesbelow. The path is a pathnamethat canbe looked upin the context of the function call being evaluated. It is not necessarilya pathexpressionthatwould bevalid to write at thatpoint in themodel.For example,ifa is abinding,thepatha/b wouldrepresentthefield b in a, but if a is aclosure,a/bwould representthevalueof b in a’s context.

Associatedwith eachdependency pathis a valuewhosefingerprint is passedto the cacheon a lookup operationor when a new cacheentry is created. Therulesfor computingthevalueassociatedwith a dependency pathvary dependingon thedependency typeandthekind of functionbeingevaluated.Theserulesaredescribedin moredetailbelow.

7.3 CachingExternal Tool Invocations

Caching run tool callsis fairly straightforward. Theprimarykey for thecacheentryis formedby combiningafixedfingerprintfor the run tool primitivewiththefingerprintsof all of thefunction’sargumentsexcepttheimplicit “ . ” argument.As describedin Section5.2.3, the tool’s file systemis suppliedvia the binding./root . Obviously, recordinga dependency on thecompletefile systemrepre-sentedby ./root would betoo coarse-grained,sinceit is very unlikely that thetool will accessall the files in that directory tree. Hence,the cacheentry’s sec-ondarykey is formedfrom thefile systemreferencesmadeby the tool during itsexecution.As describedin Chapter3, thetool’s file systemreferencesaretrappedby the repository, sent to the evaluator, and satisfiedby the evaluatorby doinglookupsin ./root .

Theevaluatormusthandletwo differentkindsof requestsfrom therepository,andit recordsthreedifferenttypesof secondarydependenciesbasedonthekind ofrequestandits outcome.Thetwo typesof requestsarelookup(looking up a namein a binding that representsa filesystemdirectory)and list (listing the entriesinsucha binding). Table7.1 lists thethreetypesof dependenciesthat theevaluatorcreatesin responseto repositoryrequestsduring a tool execution. We give thedetailsin thefollowing paragraphs.

Whenanamelookupsucceedsandreturnsafile, theevaluatorpassesthefile’sshortidback to the repositoryandrecordsa value (V) dependency. The depen-

109

Page 122: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Operation Dependency Path Valuelookup(dir, name) : success V :dir/name Value(dir/name)lookup(dir, name) : failure X :dir/name FALSElist(dir) D :dir domain(dir)

Table7.1: Thethreedependency typesrecordedby theevaluatorduringexternaltool invocations.

dency’s associatedvalueis thecontentsof thefile; however, recall that theevalu-atorandcacheactuallyusethefingerprintof thevaluein thedependency, not thevalueitself. In this case,the fingerprint for the file is suppliedby the repository(seeSection4.2.3).

When a namelookup succeedsand returnsa directory, the evaluatorpassesa handlefor the directory back to the repository. The tool now knows that thedirectoryexists, so in generalthe evaluatormustrecorda dependency reflectingthis fact; namely, anexistence(X) dependency on thedirectory’s pathname,withTRUE as its value. However, in the mostcommoncase,theseX dependenciescanbe optimizedout: if the tool subsequentlyattemptsto look up a namein thedirectory, theevaluatorwill generatea dependency whosepathhasthedirectory’spathasa prefix. Suchadependency subsumesthe X dependency for thedirectoryitself, sotheevaluatoravoidsrecordingthelatter.

Whena namelookup fails, the evaluatorreturnsa “not found” result to therepositoryandagainrecordsanexistence(X) dependency on thepath.In thiscasethepathis notdefined,sothedependency’s associatedvalueis FALSE.

Finally, if the evaluatorreceivesa requestto list a directory, it recordsa do-main(D) dependency. To obtainthedependency’s valuefingerprint,theevaluatorcombinesthe fingerprintsof the namesdefinedby the correspondingbinding, inorder. Thevaluesboundto thosenamesarenot consideredin thefingerprint.Thisprocedurereflectsthesemanticsof listing a directory: the resultdependsonly onwhatnamesaredefined(andtheirorder),notonwhatthenamesareboundto.

Table7.2showssomeof thesecondarydependenciesthatmightberecordedfora simplecompilationof a file named“hello.c”. Thefingerprintsin theFingerprintfield are thosesuppliedby the repositoryfor the correspondingfiles, except ofcoursefor FP(FALSE).

As mentionedpreviously, it is importantfor correctnessto recorda FALSEexistence(X) dependency when a lookup requestfails. For example,considerthefollowing commonsituation.A compilertypically searchesseveraldirectoriesfor headerfiles. Assumethat on onecompilation,a headerfile wasfound in thesecondsuchdirectorybut not the first. If on a subsequentcompilationa file of

110

Page 123: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Type Path FingerprintV ./root/usr/lib/ cmpl rs/ cc FP(cc )V ./root/.WD/hell o. c FP(hello.c )X ./root/.WD/stdi o. h FP(FALSE)V ./root/usr/incl ude/ std io .h FP(stdio.h )%%% %%% %%%

Table7.2: Someof thesecondarydependenciesfor a simplecompila-tion of thefile “hello.c”. FP denotesthefingerprintfunction.

thesamenameexistedin thefirst directory, a falsecachehit would resulton thefirst compilationunlessa FALSEexistencedependency hadbeenrecordedfor thefilenamein the first directory during the first build.2 Suchfalsecachehits areintolerable,sincethey canleadto inconsistentbuilds.

7.4 CachingUser-DefinedFunction Evaluations

Computingfine-graineddependenciesfor callsto run tool is relatively straight-forward, sincethey arelimited to file systemaccesses.But handlingthe generalcaseof cachinga call to a user-definedfunction written in the Vestalanguageisquitechallenging,andproved to be themostdifficult partof theevaluatordesignandimplementation.This sectiongivesthemathematicalruleswe have developedfor computingdependenciesduringtheevaluationof user-definedfunctions.It alsostates(but doesnotprove) acorrectnesstheorem.

To describethekey ideasusedin ourdependency calculation,wedefineasub-setof theVestamodelinglanguage.Table7.3givesthesubsetlanguage’s syntax.Here,Literal is thesetof possibleliterals (constants),andId is thesetof possibleidentifiers(variables).This subsethasbeenchosento includethecorepartsof theVestalanguagethatpresentthemostsignificantchallengesto effectivedependencyanalysis.It omits the language’s primitive functions(of which thereareapproxi-mately60), its iterationconstruct,its provisionsfor importing onesystemmodelfrom another, andits supportfor bindingprogramvariablesto versioneddirectoriesandfiles in theVestarepository.

Every expressionis evaluatedin someevaluationcontext, which is a mappingfrom variablenamesto values. Let Eval) e7 c* denotethe resultof evaluatingthe

2Somebuild systemsdonothandlethis issuecorrectly. For example,ClearCASErecordsdepen-denciesonly on existing files. Makefilesgeneratedby theUnix makedependtool have this problemaswell. In bothcases,thisdeficiency canleadto inconsistentbuilds.

111

Page 124: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

e ::=. a literal. x variable.<; x % e lambda. if e1 thene2 elsee3 conditional. [n1 + e1 7 n2 + e2 7 %(%(% 7 nk + ek] bindingconstructor. e= n bindingselection. e!n bindingdomaintest. e1 > e2 bindingoverlay. let x + e1 in e2 let construct. e1 ) e2 * functionapplication

Table7.3: The syntaxfor a subsetof our systemmodelinglanguage.Heree representsanexpression,a a literal, x anidentifier, andn aname.

expressione in thecontext c, andlet Dpnd) e7 c* denotethedependency informationresultingfrom evaluatinge in c. Sincewe usestandardcall-by-valueevaluation,therulesfor Eval) e7 c* arestraightforward,sowe will notdescribethemhere.

7.4.1 Computing Primary and SecondaryCacheKeys

Whencachingtheevaluationof a user-definedfunction,how aretheprimaryandsecondarycachekeys computed?Clearly, the primary key shouldcontainsomerepresentationof thefunctionitself, sincenocachehitsshouldoccuronold entriesif thefunctiondefinitionhaschanged.In fact,theevaluatorusesthefingerprintofthefunction’s parsetreeasthebasisfor theprimarykey.

What about the function’s arguments? As we have seen,folding all of theargumentsinto the primary key would producecacheentriesthat aretoo coarse-grained. However, not including any of the argumentsin the primary key wouldproducemany cacheentrieswith thesameprimary key. That would causecachelookupsto take longer, sincetherewouldbemoreentriesto searchthroughfor anygivenprimarykey — onefor eachtime thefunctionwasinvoked.

Theevaluatorchoosesacoursebetweenthesetwo extremes.It usesaheuristicthat folds thefingerprintsof simplearguments(i.e., booleans,integers,andtexts)into theprimary key, but performsfine-graineddependency analysison thecom-positearguments(i.e., bindings,lists, andclosures).That is, all dependenciesonvaluesnot capturedby the primary key are includedin the secondarykey. TheVestalanguageincludespragmasfor overriding this heuristic,but they arerarelyneeded.They areusedmainly by writersof bridgesto tunethesystemfor better

112

Page 125: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

cacheperformance;in any event, their presenceor absencecannotproducefalsecachehits.

7.4.2 Relation to Caching

As describedin the previous section,whenfacedwith a function invocation,theevaluatorfirst computesa primarykey from thefunction’s bodyandzeroor moreof theargumentvalues.In responseto this primarykey, thefunctioncachereturnsa setof secondarynames.The cachetreatsthe secondarynamesasmeaninglessstrings,but to theevaluatorthey representdependencies.Theevaluatorcomputesthevaluesof eachof thesecondarynamesin thecurrentcontext, fingerprintsthem,andsendstheresultinglist of valuefingerprintsto thecache.Thecachethentestsfor a hit aspreviously discussed.In theeventof a cachehit, theevaluatorusesthecachedresultvalue.

In theeventof acachemiss,theevaluatorproceedsto evaluatethefunctiononthegivenarguments.As it doesso,it representseachruntimevalueby a pair con-taining the truevalueandthe dependenciesdetectedwhile computingthat value.Value-dependency pairsarealsostoredfor any sub-valuesnestedinsidecompositevaluessuchaslists or bindings. Onceit hasfinishedevaluatingthe function, theevaluatorcollectsup thedependenciesin the function’s resultvalue. Any depen-denciesthatarenot alreadypartof theprimarykey areconsideredpartof thesec-ondarykey. Theevaluatorthencallsthefunctioncacheto createanew cacheentrywith thecomputedprimarykey, secondarykey, andresultvalue.It is importanttonotethat the cachedresultvalue, like all of the evaluator’s runtimevalues,itselfis a pair that includesdependencies.Whenever a laterevaluationgetsa hit on thisentry, both thevalueandits dependencieswill be neededso that thedependencyanalysisfor subsequentusesof thevaluecanproceedcorrectly.

For cachingto becorrect,thecacheentriescreatedby theevaluatormustsat-isfy a theorem,a formal statementof which is given in Section7.4.6below. Tounderstandthe intuition behindthe theorem,imaginethat the cachecontainsanentrywith primarykey pk thatresultedfrom evaluatingtheexpressione in a con-text c1. The secondarynamesassociatedwith this cacheentry will be the setofdependenciesDpnd) e7 c1 * .

Whenattemptingto evaluatee in anothercontext c2, the cachewill producea hit on the cachedentry only if the valuescomputedin c2 for the dependenciesDpnd) e7 c1 * matchthevaluesstoredin thecache.Note that thevaluesassociatedwith the dependenciesin the cacheare preciselythe valuescomputedin c1 forDpnd) e7 c1 * .

We thereforedefineequivalencebetweentwo contexts with respectto a setof

113

Page 126: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Dependency onType thecomponent’s %�%�% ComputedValue

V %�%�% completevalue Value) V : path7 c*?+ Eval) path7 c*X %�%�% existence Value) X : path= id 7 c*�+ Eval) path!id 7 c*D %�%�% domain Value) D : path7 c*?+@- n . Eval) path!n 7 c*A/T %�%�% type Value) T : path7 c*?+ Eval) typeof) path* 7 c*L %�%�% length Value) L : path7 c*?+ Eval) length) path* 7 c*E %�%�% expression(closures) Value) E : path7 c*?+ Eval) path7 c*�% body

Table7.4: Themeaningsof thesix dependency typesandtherulesusedby theevaluatorto evaluateeachtypeof pathin acontext c.

dependenciesd asfollows:

Equiv) c1 7 c2 7 d *?+�)CB p & d : Value) p 7 c1 *�+ Value) p 7 c2 *A*D%In this definition,Value) p 7 c* denotestheresultof “evaluating”thedependency pin thecontext c. It correspondsto thefingerprintassociatedwith eachsecondarynamein thecache.(We defineValue) p 7 c* for theparticulardependenciescreatedby theevaluatorin thenext section.)

Clearly, thecachehit will be correctif andonly if performingthe evaluationof e in c2 would producethe sameresultas the onestoredin the cache,that is,Eval) e7 c2 *�+ Eval) e7 c1 * . Therefore,to prove our cachingcorrect,we mustshowthat

Equiv) c1 7 c2 7 d *?EF: Eval) e7 c2 *�+ Eval) e7 c1 *D%After describingourdependency calculationalgorithm,wegiveaformalstatementof this requirementin Section7.4.6below.

7.4.3 DependencyTypes

Table7.4 presentsthe meaningsof eachof the evaluator’s six dependency typesduringa user-definedfunctionevaluation,alongwith the rulesit usesto computeValue) t : path7 c* , thevalueof thedependency patht :path in thecontext c. In thistable, the language’s primitive typeofand length functionscomputethe dynamictypeof avalueandthelengthof alist or binding,respectively. Thenotationcl % bodydenotestheclosurecl’s bodycomponent.

The V (value)dependency type recordsa dependency on the entirerun-timevalue.It is thestrongestdependency type,andhencesubsumestheotherfive. TheX typedenotestheexistence(or non-existence)of a namein a bindingor closure

114

Page 127: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

context; the D typedenotesthedomainof a binding (i.e., thesequenceof namesin the binding’s domain);the T type denotesthe run-timetype of a value; the Ltypedenotesthe lengthof a list or binding;andthe E typedenotesthevalueof aclosure’s expression(i.e., functionbody). Someof thesedependency types,suchastheT andL types,arisesolelyfrom theexistenceof certainlanguageprimitives.

Although somedependenciesaresubsumedby others,theevaluatordoesnotprunesuchredundantdependencieswhenit forms cacheentriesfor user-definedfunctions. It is unclearwhetherthecostof communicatingandstoringtheextrasoutweighsthecostof detectingandremoving them.

Thedependency typesthat theevaluatoruseswerechosenbasedon practicalconsiderationsaboutthe expecteduseof theVestalanguage.The currentdepen-dency typesareappropriatefor thestandardconstructionenvironment(Section5.4)andmany similar environments.It is possible,of course,thata radicallydifferentstyleof useof theVestalanguagemight reveal a needfor additionaldependencytypes.

7.4.4 DependencyCalculation Rules

We now give themathematicalrulesfor calculatingdependencies.We first defineD ) e7 c 7 p* where p is a dependency path; D ) e7 c 7 p* evaluatesto a setof depen-dency paths.We thendefineDpnd) e7 c*G+ D ) e7 c 7 V : 8H* . Intuitively, D ) e7 c 7 p* isthedependency for just theportionof e’svaluethatis selectedby p. Thedefinitionof D ) e7 c 7 p* now proceedsby caseson theprogramstructure:

$ If e + a ) a & Literal * , thenD ) e7 c 7 t : p*I+KJ . Evaluatinga constanthasnodependency on thecontext.

$ If e + x ) x & Id * , thenD ) e7 c 7 t : p*L+M- t : x = p/ . Evaluatinga variablexdependsonly on thedependency pathextendedon theleft by x.

$ If e +@; x % e1, only thefollowing two casesarise:

D ) e7 c 7 V : 8H*�+ FVs) e*D ) e7 c 7 E : 8N*�+O-A/

whereFVs) e* denotesthe setof e’s free variables. In both cases,the pathmustbeempty. If thetypeof thepathis V , thelambdaexpressiondependson thewholeclosurevalue,that is, thesetof e’s freevariables.If the typeof the path is E, it dependson only the lambdaexpressionof the closurevalue,namely, theexpressione. Sincee is incorporatedinto theprimarykeyof every functioncall whosebodycontainse, it is correctnot to recordanydependenciesin thiscase.

115

Page 128: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

$ If e = if e1 thene2 elsee3, thenwehave thefollowing rule:

d1 + D ) e1 7 c 7 V : 8H*' 1 + Eval) e1 7 c*

d2 + if ' 1 thenD ) e2 7 c 7 t : p* elseD ) e3 7 c 7 t : p*D ) e7 c 7 t : p*�+ d1 P d2

This rule statesthat thedependency of a conditionalis theunionof thede-pendenciesof theguarde1 andthedependency of eithere2 or e3, dependingonthevalueof e1. Weuseanemptypathin computingtheguard’sdependen-ciesbecausetheguardevaluatesto a booleanvaluethathasno components.

$ If e + [n1 + e1 7 n2 + e2 7 %(%(% 7 nk + ek], therearetwo casesto consider. Ifp is empty, it meansthatwe dependon theentirebinding. If p + t : ni = p1,it meansthat we dependonly on thevalueof field ni . The following rulescover thetwo cases:

D ) e7 c 7 t : 8H*�+ P ki Q 1D ) ei 7 c 7 V : 8H*

D ) e7 c 7 t :ni = p1 *�+ D ) ei 7 c 7 t : p1 *

$ If e + e1 = n, thenD ) e7 c 7 t : p*�+ D ) e1 7 c 7 t :n= p* . For bindingselection,werecursively call D with thepathextendedon theleft by n.

$ If e + e1!n, thenD ) e7 c 7 t : p*R+ D ) e1 7 c 7 X :n= p* . For thebindingdomaintest,werecursively call D with thepathextendedon theleft by n. Notethatthenew dependency pathhastype X, regardlessof thetypet .

$ If e + e1 > e2, thentherearetwo casesto consider:If p is empty, it meansthat we dependon the entirebinding. If p + t : n= p1, it meansthat wedependonly on thebinding that suppliesthen field. In thecasethat thenfield comesfrom e1, wemustaddthedependency thatn is notdefinedin e2.Thefollowing rulescover thetwo cases:

D ) e7 c 7 t : 8H*�+ (7.1)

D ) e1 7 c 7 V : 8H* P D ) e2 7 c 7 V : 8H* (7.2)

D ) e7 c 7 t :n= p1 *�+ (7.3)

if Eval) e2!n 7 c* (7.4)

thenD ) e2 7 c 7 t :n= p1 * (7.5)

elseD ) e2 7 c 7 X :n* P D ) e1 7 c 7 t :n= p1 * (7.6)

116

Page 129: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

$ If e + let x + e1 in e2, then

c1 + c SR- x T Eval) e1 7 c*A/d2 + D ) e2 7 c1 7 t : p*

d2a +@- pUV. pUW& d2 X head) pUY*RZ+ x /d2b +@- t U : pU . t U :x = pU & d2 /

D ) e7 c 7 t : p*�+ d2a P p[ 0 d2bD ) e1 7 c 7 pU *whereS denotestheoperationfor extendinga context, andhead) p* denotesthe first elementof p’s path. We first augmenttheevaluationcontext withx mappedto Eval) e1 7 c* and computed2 as the dependency of e2 in theaugmentedcontext. Wethendivide thedependency pathsin d2 into two setsd2a and d2b. The set d2a containspathsunrelatedto x. So, d2a must beincludedin the result. The setd2b containspathsstartingwith x. So, weneedto recursively computeD ) e1 7 c 7 pUY* for eachpathpU in d2b.

$ If e + e1 ) e2 * , then

Eval) e1 7 c*�+]\^; x % e3 7 c3 _d1 + D ) e1 7 c 7 E : 8H*

c1 + c3 S`- x T Eval) e2 7 c*A/d3 + D ) e3 7 c1 7 t : p*

d3a +@- pUa. pUb& d3 X head ) pUY*`Z+ x /d3b +@- t U : pU . t U :x = pU & d3 /D ) e7 c 7 t : p*?+ d1

) P p[ 0 d3aD ) e1 7 c 7 pU *A*) P p[ 0 d3bD ) e2 7 c 7 pUc*d*

This rule is similar to the rule for the let construct,wheree1 in the let ex-pressionis like theargumente2 here,ande2 in thelet expressionis like theclosurebodye3 here.

7.4.5 Example

Wenow presentasimpleexampleto demonstratetheabovedependency rules.Wecomputethedependenciesfor theexpression:

e + let x + [r + [s + y] 7 t + z] in x = r = sin thecontext c. Obviously, Eval) e7 c*?+ c ) y* . Hereis thestartof thedependencycalculation:

117

Page 130: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Dpnd) e7 c*egf definitionof Dpnd hD ) e7 c 7 V : 8i*

Sincetheexpressione is a let construct,thelet rule applies.Themainstepin cal-culatingthedependenciesfor thelet constructinvolvescalculatingthedependencysetnamedd2 in that rule, wheree1 + [r + [s + y] 7 t + z] ande2 + x = r = s. Herewe derive the value for d2, usingc1 to denotethe augmentedcontext c S9- x TEval) e1 7 c*A/ :

D ) x = r = s 7 c1 7 V : 8H*egf bindingselectionrule hD ) x = r 7 c1 7 V :s*egf bindingselectionrule hD ) x 7 c1 7 V :r = s*egf variablerule hf V :x = r = s h

Fromthedependency setd2, we computethepartitionedsetsd2a andd2b:

d2a +OJd2b + f V :r = s h

Wecannow continuecomputingDpnd) e7 c* :D ) e7 c 7 V : 8i*egf let rule hJ P D ) [r + [s + y] 7 t + z] 7 c 7 V :r = s*egf bindingconstructorrule hD ) [s + y] 7 c 7 V :s*egf bindingconstructorrule hD ) y 7 c 7 V : 8H*egf variablerule hf V : y h

Hence,theevaluationof e in c dependsonly onthevalueof y, aswewouldexpect.

7.4.6 Corr ectness

Thefollowing theoremstatesthecorrectnessof thedependency calculationrules.

Theorem 1 (CachingCorrectness)If the expressione evaluatesto a value ' inthecontext c1, thenwecancomputeDpnd) e7 c1 * , and,if everypathin Dpnd) e7 c1 *

118

Page 131: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

evaluatesto thesamevaluein contextsc1 andc2, thene alsoevaluatesto ' in c2.Formally,

)Cjk' : Eval) e7 c1 *�+l'm*�EF: (7.7)

)Cj d : Dpnd) e7 c1 *�+ d * (7.8)

X ) Equiv) c1 7 c2 7 Dpnd) e7 c1 *A*�EF: (7.9)

Eval) e7 c2 *3+ Eval) e7 c1 *A*d*D% (7.10)

Beforeimplementingthedependency algorithm,weformalizedtheaboveeval-uationanddependency rules for this subsetof the Vestalanguagein the Nqthmtheoremprover [9], andmechanicallycheckedthecorrectnesstheorem.Theproofis beyond thescopeof this paper. It took several iterationsof runningtheproverandcorrectingour rulesbeforethe mechanicalproof succeeded.This proof ef-fort revealeda coupleof subtleerrorsin earlier versionsof the rules. We thenimplementedthe rulesin theVestaevaluator. Although we did not mechanicallycheckthe rulesfor the completeVestalanguage,the subsetwe did verify coversthemostcomplex aspectsof thelanguage,andsothemechanicalverificationwasquiteuseful.

Ourdependency calculationis relatedto earlierwork by Abadi,Lampson,andLevy, which usesa labeled; -calculusto computebothdynamicandfine-graineddependencies[1]. However, their approachis quite different from ours. It asso-ciateslabelswith all expressionsin a function body, andthendevelopsrulesforkeepingthelabelsof only thoseexpressionsthatareevaluatedduringa call. As aresult,it recordsonly onekind of dependency, analogousto our valuedependen-cies. Also, their calculussupportsonly theselectionoperationon records.Com-puting dependenciesfor the binding operators! and+ complicatesthe problemsignificantly.

7.5 CachingModel Evaluationsand Model Values

We treattwo closelyrelatedtopicsin this section:cachingevaluationsof models,andrecordingdependencieson modelvaluesin othercacheentries.

7.5.1 Model Evaluations

Theevaluatorcreatestwo cacheentriesfor eachmodelevaluation:a specialentrythat takesadvantageof the specialsemanticsof models,anda normal entry thatcachesthe model evaluationin the samemanneras the evaluationof any other

119

Page 132: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

functioncall.3

The main differencebetweenspecialandnormalmodelentriesis in the dis-tribution of dependenciesbetweenthe primary and secondarycachekeys. Theprimarykey of aspecialmodelentrycombinestwo items:

1. the fingerprintof the immutabledirectorycontainingthe model,which (asexplainedin Section4.2.3)capturesthe stateof the entire immutabletreerootedat thatdirectory, and

2. themodel’s namerelative to thatdirectory.

Computingtheprimarykey in this way impliesthattheevaluationdependson themodel’sentiredirectorytree,includingthecompletetext of themodelitself. All lo-cally referencedfilesarethuscapturedin theprimarykey andneednotberecordedin thesecondarykey. All non-localmodelimportsarealsocaptured,becausetheabsolutepathnameof eachsuchimportappearsin thetext of themodelfile andthuscontributesto thefingerprint.(Unfortunately, any files in themodel’sdirectorytreethat themodeldoesnot referencearealsocaptured,aswell asirrelevantelementsin the model text suchasimportsthat arenot used,comments,andwhitespace.)Becauseso muchis capturedby the primary key, the secondarykey of a specialmodelentry includesonly fine-graineddependencieson themodel’s environment(“ . ”) parameter.

In contrast,the dependenciesof a normal model entry are calculatedin thestandardway, by treatingthemodelasaclosureof oneargumentwhosecontext isdefinedby its files andimport clauses.Thusanormalmodelentry’s primarykey capturesonly theparsetreeof themodelbody, while its secondarykey includesboth referencesto the environment(“ . ”) parameterand referencesto files andmodelsmadethroughthebody’s freevariables.

Why two kindsof entries?Specialmodelentrieshave fewersecondarydepen-denciesthannormalmodelentries,making themfasterto look up in the cache.More importantly, asmentionedat the endof Section7.1, specialmodelentriesserve ascut-off pointsto prevent too many dependenciesfrom propagatingup thefunction call graph: individual dependencieson files or imports of a model arenot propagatedbeyond calls or referencesto that model. Without thesecut-offs,theroot nodeof thefunctioncall graphof a build would containa dependency onevery sourcefile contributing to thebuild.

Moreover, cachehitsonspecialmodelentriesarecommon.For example,whenan applicationis built againstseveral frequently-usedlibraries, chancesare thatmostof thelibrarieswill beunchanged,sotheevaluatorwill getfastcachehitson

3Recallfrom Section5.3.4thatmodelsaresemanticallyequivalentto closuresof oneargument,soa modelevaluationis semanticallyidenticalto a functioncall.

120

Page 133: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

thelibraries’ specialmodelentries.In fact,theperformancebenefitsof thespecialcacheentriescreatedfor modelevaluationsareso appealingthat we sometimescreateaseparatemodelfor apieceof Vestacode,ratherthansimplywrappingthatcodein a functiondefinition within an existing model. This techniqueleadsto aslightly largernumberof modelfiles in exchangefor bettercachingperformance.

Falsecachemissesonspecialmodelentriesarepossiblebecausetheentriesareintentionallycoarse-grained.Whenlooking up a modelevaluationin the cache,theevaluatoralwayschecksfor aspecialmodelentryfirst; if thatlookupmisses,itchecksfor a normalmodelentry. A new specialmodelentry is createdwheneverthefirst lookupmisses,evenin theeventof acachehit on thenormalmodelentry.Althoughit is uncommonfor a cachelookupto misson a specialmodelentrybuthit on the correspondingnormalentry, this doeshappenoccasionally. Suchhitssave enoughwork to amplyjustify thecostof creatingthenormalentry.

7.5.2 Model Values

Recallfrom Section5.2.4thatevery importedmodelhasa correspondingclosurevaluethat canbe returnedas(part of) a function result just like any othervalue.Whenever any function evaluationdependson an unevaluatedmodel value, werecordthis dependency in a coarse-grainedmanner, usingthesameideadescribedabove for constructingtheprimarykey of amodelevaluation.

For example,supposefunction f1 calls modelm1, while function f2 returnsan unevaluatedmodelm2 aspart of its result. (Both thesecasesarecommoninour own usageof Vesta.) When f1 calls m1, the evaluatorwill of courseeitherfind or createa specialmodelentry for m1 as just described,and f1 will inheritthe limited secondarydependenciesfrom this coarse-grainedcacheentry aspartof its own dependencies.In thecaseof f2, althoughm2 is unevaluatedandhenceno cacheentry is involved, the evaluatorstill constructsf2’s dependency on m2

usingthesamemodelfingerprintingtechnique.Thatis, theevaluatorcreatesa“V”dependency whosefingerprintfield combines(1) thefingerprintof the immutabledirectoryfrom whichm2 wasimported,and(2) m2’snamerelativeto thatdirectory.As before,this techniqueis correctbecausem2’svalueasan(unevaluated)closureis fully specifiedby the immutabletext of the model file that definesit and theimmutablecontentsof thedirectorytreewherethefiles andimport clausesin thatmodelareresolved.

121

Page 134: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

7.6 Example Evaluation Call Graphs

We now considerthe effectsof the evaluator’s function cachingin practice. Weshow thecall graphsthatresultfrom theevaluationof variouspackages.

7.6.1 ScratchBuild of the Standard Envir onment

Figure7.1shows thecall graphthatresultsfrom evaluatingthemodelfor thestan-dardconstructionenvironmentagainstanemptycache.In this figureandthenexttwo, thenodesdenotefunctioninvocations,andanedgefrom anode f to anodegindicatesthat f callsg. Theedgeis solid blackif f andg aredefinedin thesamepackage,andgrayotherwise.

Thecall graphin this examplecanbedividedroughly in half. Theleft half ofthe figure shows the evaluationof a strippeddown “backstop”standardenviron-ment. This environmentcontainsbridgesandlibrariesnecessaryto build the fullstandardenvironment,which is shown in theright half of thefigure.

In bothcases,building thestandardenvironmententailsevaluatingbridgemod-els and library models. The library modelsaredivided into threegroups: C li-braries,Vestalibraries, and Modula-3 libraries. The full standardenvironmentcontainsthreemorebridgesthanthebackstopenvironment.Oneof these,the limbridge,requiresmorework, sincethelim tool exportedby thebridgemustbebuiltfrom source.

Notethatalthoughthis evaluationwasperformedagainsta cachethatwasini-tially empty, somecachehits still occurred.In particular, cachehits occurredonfive of thesix bridgesin the full environmentbecauseidenticalversionsof thosebridgeshadalreadybeenevaluatedin thebackstopenvironment. Cachehits alsooccurredon thecompletesub-treesof theC andModula-3libraries,againbecausethoselibrarieshadbeenevaluatedin thebackstop.In contrast,hits did not occuronall of theVestalibrariesbecausetheversionof thoselibrariesreferencedby thefull standardenvironmentmodel(i.e., version30) differs from theonereferencedby thebackstop(version28). Notethatnoneof thelibrary evaluationsresultedin atool invocation,sincein generaltheactualconstructionof a library is delayeduntilaprogramis built againstit.

7.6.2 ScratchBuild of the VestaUmbrella Library

Figure7.2 shows thecall graphthat resultsfrom building a “hello world” C pro-gramagainstthecompleteVestaumbrellalibrary, assumingthat the constructionof the standardenvironmentmodel shown in Figure7.1 is alreadycached. Theonly reasonfor building sucha simpleprogramagainstthe Vestaumbrellais to

122

Page 135: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

cach

e hi

tm

odel

cac

he h

itne

w e

ntry

new

mod

el e

ntry

new

_ru

n_to

ol e

ntry

stan

dard

env

ironm

ent b

acks

top

lim b

ridge

Ves

ta li

brar

ies

(ver

sion

30)

brid

ges

C li

brar

ies

Ves

ta li

brar

ies

(ver

sion

28)

Mod

ula−

3 lib

rarie

s

brid

ges

C li

brar

ies

Mod

ula−

3lib

rarie

s

lex

yacc

link

Fig

ure

7.1

: T

he

ca

ll g

rap

h f

or

the

co

nst

ruct

ion

of

the

sta

nd

ard

en

viro

nm

en

t m

od

el.

To

bu

ild t

he

sta

nd

ard

en

viro

nm

en

t, a

str

ipp

ed

do

wn

‘‘b

ack

sto

p’’

en

viro

nm

en

t is

first

co

nst

ruct

ed

as

sho

wn

inth

e le

ft h

alf

of

the

fig

ure

. T

he

en

viro

nm

en

t p

rop

er

is t

he

n b

uilt

, w

hic

h in

clu

de

s th

e c

on

stru

ctio

no

f th

ree

ad

diti

on

al b

rid

ge

s. O

ne

of

the

se,

the

lim

brid

ge

, re

qu

ire

s b

uild

ing

th

e li

m t

oo

l fro

m s

ou

rce

.

123

Page 136: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

cach

e hi

tm

odel

cac

he h

itne

w e

ntry

new

mod

el e

ntry

new

_ru

n_to

ol e

ntry

com

pile

hel

lo.c

link

hello

pro

gram

build

sta

ndar

dco

nstr

uctio

n en

viro

nmen

t

GC

libr

ary

preb

uilt

libra

ries

Bas

ics

libra

ry

RP

Clib

rary

arch

ive

conf

ignlib

rary

com

pile

con

figlib

rary

Log

libra

ry

Fin

gerp

rint l

ibra

ry

Rep

osito

rylib

rary

Fun

ctio

n C

ache

libra

ry

com

pile

Run

Too

l lib

rary

arch

ive

Run

Too

l lib

rary

Fig

ure

7.2:

The

cal

l gra

ph fo

r th

e co

nstr

uctio

n of

‘‘he

llo w

orld

’’ lin

ked

agai

nst t

he c

ompl

ete

Ves

taum

brel

la li

brar

y, a

ssum

ing

the

eval

uatio

n of

the

stan

dard

con

stru

ctio

n en

viro

nmen

t is

cach

ed. T

he b

ulk

of th

is fi

gure

dep

icts

the

cons

truc

tion

of th

e V

esta

um

brel

la li

brar

y, w

hich

con

sist

s of

sev

en p

rebu

iltlib

rarie

s an

d ni

ne o

ther

libr

arie

s bu

ilt fr

om s

ourc

e.

124

Page 137: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

illustratetheconstructionof theumbrella,whichconstitutesthebulk of thisfigure.Theactualconstructionof thehello world programitself consistsonly of the tworightmosttool invocations.

Althoughthefunctioncacheis relatively emptyin thisexample,theimportanceof goodcachingis evident.Thecachehit onthestandardconstructionenvironmentin theupperleft of thefigureis critically important,asit savestheevaluatorfromhaving to evaluatetheentirecall graphof Figure7.1!

Perhapsthemainthing to noticeaboutthiscall graphis its sheersize.Buildingthe Vestaumbrellalibrary from scratchrequires86 tool invocations. Exceptforthe fingerprint library, eachsub-libraryis constructedby compiling the library’ssourcesoneat a time, andthencollectingthe resultingobjectstogetherinto a li-braryarchive.

The constructionof the fingerprint library requiresa bit morework, sinceaprogramfor generatingaheaderfile of computedfingerprintconstantsmustfirst bebuilt andrun. Thisexampleshowshow theintermediateresultsof abuild (namely,theprogramfor generatingtheheaderfile) canbeinvokedlaterin thebuild. It alsoillustratesthatVesta’s distinctionbetweensourceandderivedfiles is not thesameastheC compiler’sdistinctionbetweensourceandobjectfiles;here,aC sourcefileis mechanicallygenerated,not storeddirectly in the repository, so it is a derivedfrom Vesta’s point of view.

7.6.3 Scratchand Incr ementalBuilds of the Evaluator

Figure7.3(a)shows the call graphthat resultsfrom building the Vestaevaluatorpackagefrom scratch,assumingthatthestandardenvironmentandVestaumbrellalibrary arecached.In additionto theVestaevaluatorexecutable,building thepack-agealsocausesa helperprogramandsomederiveddocumentationfiles to becon-structed.

Thisfigureis anevenbetterexampleof theimportanceof effectivecaching.Inthiscase,thereis ahit notonly ontheevaluationof thestandardconstructionenvi-ronment,but alsoon theentireVestaumbrellalibrary shown in Figure7.2. Hence,the 86 tool invocationsshown in that previous figure areall avoidedby virtue ofa singlecachehit. The constructionof the evaluatoritself is straightforward: its18 sourcesarecompiledinto objects,and thoseobjectsare then linked togetheragainstthe Vestaumbrellalibrary. The constructionof the helperprogramanddocumentationfiles is alsostraightforward.

Figure7.3(b) shows the call graphthat resultsfrom an incrementalbuild oftheVestaevaluator. Thisexampleillustratesthetypical casethatoccursduringtheinneredit-build-test loop of thedevelopmentcycle. This figurewasproducedbystartingwith thecachethat resultedfrom theevaluationof Figure7.3(a),modify-

125

Page 138: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

hit building Vestaumbrella library

compile Vesta evaluator sources

link Vesta evaluator

hit building standardconstruction environment

builddocs

buildhelper

program

(a)

cache hit

model cache hit

new entry

new model entry

new _run_tool entry

hit building standardconstruction environment

hit building Vestaumbrella library

hit compiling first 9Vesta evaluator sources

recompile modifiedsource file

hit buildingevaluator docs

hit buildinghelper programs

re−link Vestaevaluator

(b)

Figure7.3: Thecall graphsthatresultfrom building theVestaevaluatorpackagefrom scratch(a) andagainafter oneof the evaluator’s sourcefiles hasbeenmodified (b). In both cases,hits occur on the standardconstructionenvironmentandtheVestaumbrellalibraries.

126

Page 139: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

ing oneof theevaluator’s sourcefiles,andrebuilding. In this case,therearemanycachehitsandonly two tool invocations(oneto recompilethemodifiedsourceandoneto relink theevaluatorexecutable).Becausethehelperprogramanddocumen-tationsourceswerenot changed,high-level cachehits occuron thosepartsof theevaluation.

This examplealsoillustratesan interestingfeatureof theC andC++ bridges.In particular, notice that the compilationof 10 or more sourcesin a library orapplicationproducesa balancedsubtreein the call graphso that no morethan9files arecompiledtogetherdirectly undera singlenode. This “weepingwillow”effect is especiallyevident in the constructionof the GC library in Figure 7.2.Whentheevaluatorwasbuilt from scratch,thecompilationof its 18 sourceswasautomaticallydivided by the bridge into two groupsof 9. As a result, the workrequiredto recompilethefiles in thefirst groupcanbeavoidedby a singlecachehit. In thegeneralincrementalcase,thisdivide-and-conquer techniqueresultsin anumberof cachelookupsproportionalto the logarithmof thenumberof sourcesin the library or application,ratherthanthe linearnumberthatwould resultif thenaive flat organizationhadbeenused.

7.7 Err or Handling

Somecareis requiredto handleevaluationerrorscorrectlyin thefaceof caching.Two kindsof errorsarepossible:runtimeerrorsthatoccurwhile evaluatinganex-pressionin theVestalanguageandfailed run tool calls.Theevaluatorhandlesthetwo kindsof errorssomewhatdifferently.

When an evaluationerror occurs,suchas attemptingto selectan undefinedcomponentof a binding,theevaluationis aborted.An errormessageis printedtotheuser, alongwith theevaluationcall stackto indicatetheerror’s source.In thiscase,naturally, no cacheentriesarecreatedfor thefunctionevaluationsthatwerestill in progress(i.e., on thecall stack)at the time of theerror, sincethey hadnotyet returneda valuethat couldbecached.Earlier functioncalls in theevaluationthatdid completearecached,however, soif theevaluationis performedagain,thesameerrormessageis regeneratedquickly.

In contrast,whena run tool call fails, it may be undesirableto abort theentire evaluationimmediately. For example,when the evaluator is compiling along list of sources,the userwill often go off anddo somethingelse. If a com-pilation early in the list fails, the userwill be muchhappierif the evaluatorgoeson to compilethe restof the files beforehe returnsthanif it abortsimmediately.Therefore,whena run tool call fails, if an appropriatecommandline switchhasbeengiven,theevaluatorprintsanerrormessageandcontinues,indicatingthe

127

Page 140: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

erroraspartof thefunctionresult. (If this switchis not given,theevaluatorprintsan error messageandabortsthe evaluation.) Bridgesaretypically written to de-tectany sucherrorsandpassthespecialvalueERRup to their callers,causinganyattemptto usetheresultof thecompilationto fail andstoptheevaluation.

When a run tool call fails, althoughit would be theoreticallycorrecttocachethe failure and return it againif the build is repeated,it is preferablenotto do so, for two reasons.First, occasionallya tool invocationmay fail becauseof sometransientcondition that is outsideVesta’s control, sucha full disk or anetwork timeout. Capturingsucha failure in thecachewould make it impossibleto correctthe problemby clearingthe transientconditionandretrying. Second,Vestabridgesare usually written to print suchmessagesas a side-effect, ratherthanincludingthemaspartof the run tool result.Whena Vestauserneedstoreproducetheerrormessagesfrom a failed compile,it is importantfor him to beableto run thecompileragainto regeneratethemessages.

Therefore,failedcalls to run tool mustnot becached.Cachingany func-tion higherup in thecall graphwould beequallybad,sincea hit on suchanentrywould prevent the failed call to run tool from reoccurring,againsuppressingtheerrormessages.To handlethis situation,theevaluatorrecordsa cachableflagfor eachruntimevalue. Theevaluatorcomputesthecachableflag of a new valuebasedon thecachableflagsof the valuesfrom which thenew valueis produced.A cacheentry for a function evaluationis createdonly when its result value iscachable.

Gooderror reportinghasits costs. Whenthe evaluatorencountersa runtimeerror in a Vestamodel,it is importantfor it to reportthe location(file name,linenumber, andcolumnnumber)of eachentryon thefunctioncall stack;if this infor-mationwerenotprovided,failing modelswouldbevery difficult to debug. There-fore, whenthe evaluatorcachesa function resultvaluethat includesa closure,itmuststorenot only theparsetreeof theclosure’s functionbody, but alsotheorig-inal modelnamefrom which thebodywasreadandtheoriginal line andcolumnnumbersof its tokens. Theseextra fields increasethe function cache’s commu-nication,memory, andstoragecosts.However, our experiencedebuggingsystemmodelsthatuseclosuresdemonstratesthattheextra cachingcostsarewell worth-while.

128

Page 141: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Chapter 8

Weeder

Vestamanagesthecreationandnamingof derivedfilesautomatically. Thedeletionof derivedfiles is left to aseparatequasi-automaticmanagementprocesscalledtheVestaweeder. Theweederis alsoresponsiblefor deletingunwantedcacheentries,whichconsumesystemresourcesaswell.

8.1 Moti vation

Ideally, thefile deletionprocesswouldbecompletelyautomatic,perhapsoperatingaccordingto someheuristic. For example,it might deletefiles older thansomethresholdage,or thosethat hadnot beenrecentlyused. However, therearetwoproblemswith suchheuristicapproaches.

First,aderivedfile cannotsimplybedeletedirrespective of thosecacheentriesthatrefer to it. A seriousfailurewould resultif a client couldgeta hit on a cacheentrythatreferencedanonexistentderivedfile! Hence,thedeletionof derivedfilesmustbecoordinatedwith thedeletionof unwantedcacheentries.

Second,the frequency of derivedfile obsolescencetendsto vary dramaticallyfrom packageto package,dependingon how actively a packageis undergoingdevelopment.As a result,any fixedheuristicis likely to deletetoo muchin somepackages,andnot enoughin others. Sincethe developersknow the mostaboutwhichpackageversionsareworthpreserving,it makessenseto givethedeveloperscontrolover whatis kept,ratherthanrelying onaheuristic.

Thesepointssuggestthatthedeletionof derivedfiles andcacheentriesshouldbe managedby a commonprocess,and that the input to that processshouldbespecifiedby people. The Vestaweederis responsiblefor managingthe deletionprocess.It determineswhich derivedfiles andcacheentriesto keepusinga tech-niqueakin to garbagecollection,andthencommunicateswith the repositoryand

129

Page 142: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

functioncache,whichdo theactualwork of deletingeverythingelse.

Apart from the obvious requirementthat the weedernot deleteanything thatit wasmeantto keep,themainconsiderationin thedesignof theweederis that itbeasunobtrusive aspossible.By this we meantwo things.First, weedingshouldbe transparentto users.While a weedis in progress,it shouldbe possibleto donormalbuilds, andthe performanceimpactof weedingon the restof the systemshouldbeminimal. Second,theadministrative overheadof runningaweedshouldbenegligible. In particular, weedingshouldbean infrequentactivity, theweedershouldnot requiregrossamountsof disk spaceor memory, and the instructionstelling theweederwhatto keepshouldbesimpleto specify.

In thischapter, welimit ourdiscussionto theweeder’sdesign,implementation,andusability. We evaluatethe weeder’s performancein Section9.5 of the nextchapter, covering the measuredperformanceof weedingand the frequency withwhichweedingis requiredfor amedium-sizeddevelopmentproject.

To simplify the taskof writing weedinginstructions,the setof cacheentriesandderived files to keepis specifiedat a coarsegranularity;namely, at the levelof packagebuilds. If theweederis instructedto keepa particularpackagebuild,it automaticallykeepsall cacheentriesandderivedsgeneratedby thatbuild. Thepackagebuilds to keeparespecifiedusinga simplebut powerful patternlanguage(describedbelow), soinstructionfilescanbekeptshortandeasyto understand.Aswe shall see,oneof the advantagesto specifyingpackagebuilds is that they arequitestraightforwardto name.

What happensif the weederis run with instructionsthat tell it to deletetoolittle or toomuch?Thecostof deletingtoofew derivedfilesis thatnotenoughdiskspaceis freedup. In thatcase,theweedercansimply bere-runwith a smallerlistof packagesto keep.Thecostof deletingtoomuchis thatsomebuildsmayhave toberepeated.But becauseall sourcesin Vestaareimmutableandimmortal,andallbuildsarerepeatable,thereis never acorrectnessproblemassociatedwith deletingtoo much. Hence,thereis a time-spacetradeoff inherentin theweedingprocess.Weedingmorederived files freesup moredisk space,but it may requiretime torecreatethoseweededfiles (andcacheentries)if they areneededlater.

Why is Vesta’s clean-upprocesscalledweedinginsteadof garbagecollecting?In a normalgarbagecollector, thereis a cleardistinctionbetweenobjectsthatarereachablefrom a setof known rootsandthosethat areunreachablegarbage.InVesta,thereis no real “garbage”:any cacheentry is potentiallyusefulin a futurebuild. Hence,a personmustdecidewhich builds areworth keeping.Thesebuildsaretakenasthe “roots” of a typical markandsweepgarbagecollection;all otherbuilds areweededfrom thesystem.Decidingwhich builds to keepis subjective:oneperson’s flower is another’s weed.

130

Page 143: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

8.2 WeedingInstructions

Theinputto theweedernamestheexactversionsof control-panelmodelsonwhichtheevaluatorwasinvokedandwhosecorrespondingbuildsaredeemedworthkeep-ing. It wouldbetoo tediousto have to specifytheexactnameandversionof everyVestapackagebuild to bekept,sotheweeder’s input is apatternlanguage.Hereistheinput for oneof our recentweeds:

+ /vesta/west.vestasys.org/vesta/release/[LA ST-1,L AST]/. main.v es+ /vesta/west.vestasys.org/vesta/*/LAST/.mai n.ves+ /vesta/west.vestasys.org/vesta/*/checkout/ LAST/* /.main .ves

Thefirst line causesthetwo latestversionsof theVestareleaseto bekept.Thesecondline keepsthe last checked-in versionof every Vestapackagebuild, andthe last line keepsall checkout sessionbuilds of the last versionof every Vestapackage.Eachpatternendsin “ .main.ves ”, indicatingthateachpatternnamesarootcontrol-panelmodelonwhichtheVestaevaluatorwasactuallyinvoked.Theweederkeepsall derived files andcacheentriescreatedby suchroot invocations.In addition,it is possibleto specifyasa command-lineoption to the weederthatany buildsperformedrecently(i.e, in thelastn hoursfor somegivenn) shouldalsobekept.

Note that eachline is precededby a plus sign, indicatingthat theseversionsare to be kept. The weeder’s input languagealsoallows lines beginning with aminussign,which removesthespecifiedpackagebuilds from thesetof builds tokeep.Thelinesareprocessedin order;plusandminussignsmaybealternatedtosuccessively includeandexcludedifferentsetsof packagebuilds. The exclusionfeaturemakesthelanguageslightly morecomplex, andin hindsight,perhapsit wasnotnecessary;thusfar we have nothadoccasionto useit.

8.3 Implementation

Beforedescribingtheweeder’s implementation,wefirst describesomepreliminaryinformationaboutthefunctioncacheandevaluator.

First,we describethecachefrom theweeder’s point of view. Runninga setofVestaevaluationsagainstthecachecanbeviewedasgeneratinga directedacyclicgraph(DAG) in whichthenodesarecacheentries(i.e.,functioncallswhoseresultsarecached)andin which thereis anedgefrom entry f to entryg if andonly if thecall g wasmadeduring theevaluationof f ’s body. This DAG is in generalnot atree,becauseacacheentryacquiresanadditionalparentwhenever anew cachehitis producedon it. Therootsof theDAG correspondto theevaluationsof thetop-level (controlpanel)modelson which theVestaevaluatorhasbeeninvoked. The

131

Page 144: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

functioncachein factmaintainsthis graphasanexplicit datastructure;thenodesare cacheentriesthat are storedas describedin Chapter6, while the edgesarekepton disk asa separatefile calledthegraph log. Eachcacheentry is identifieduniquely by a small integer called its cache index, and the graphlog edgesarerepresentedin termsof cacheindices. Eachentry in the graphlog containstheindex of acacheentryandtheindicesof all its children.Thegraphlog is a“log” inthesensethatexceptduringweeding,new edgesareonly appendedto it; existingedgesarenever deletedor changed.

Second,apointaboutderivedfiles. To avoid largeruntimevalues,theevaluatorrepresentsthecontentsof asourceor derivedfile by theshortidof thecorrespond-ing file in the repository. For example,the resultvalueof a compiler invocationmight be a singletonbinding that mapsthe nameof a derived file to the file’sshortid. The shortidsof all files referencedin a function’s result value arealsostoredin thegraphlog entrycorrespondingto thatfunctionevaluation.1 Hence,allderivedsreferencedby anevaluationcanbereachedby traversingthefunctioncallgraphcorrespondingto thatevaluation.

So much for preliminaries. The weederitself works like a mark-and-sweepgarbagecollector. It readsthe user’s specificationof which modelevaluationstokeep,andtreatsthoseasthe rootsof its mark phase.It thenreadsthe graphlogwritten by the function cache,andmarksall cacheentriesreachablefrom any oftheroots.It alsowritesto a separatefile theshortidsof all derivedfiles referencedfrom any marked entry. Onceit hasmarked thederived files andcacheentriestokeep,the weedernotifies the repositoryand the function cacheto do the actualwork of deletingtheunmarkedderivedfilesandcacheentries(in parallel).

This descriptionglossesover two importantissuesin theweeder’s implemen-tation.

Thefirst issuearisesfrom thefact thatVesta’s designtargetsarelargeenoughthat the entire function call graphis not expectedto fit entirely in memory(seeSection3.3).However, thegraphissmallenoughthattheweedercanafford to keepasinglemarkbit in memoryfor eachcacheentry;it collectsthesebits togetherintoa largebit vector. Theweeder’smarkphasebeginsby markingeachof theroots.Itthenworksby iteratively scanningthegraphlog. Oneachscan,any graphlog entrywhosecorrespondingcacheindex is unmarked is written out to a new versionofthelog to bereadon thenext iteration.If, on theotherhand,thegraphlog entry’scorrespondingcacheindex is marked, themark bits correspondingto its childrenareset,andthegraphlog entryis thendropped.Thisprocessrepeatsuntil nomore

1For themostpart,thesearederivedfiles,but it is quitepossiblefor a functionto returna sourceaspartof its result.Strictly speaking,suchfilesarenotderived,but they muststill beprotectedfromweedingby virtueof beingreferencedin a functionresult.In thischapter, wewill simply referto allfiles referencedby a functionresultasderivedfiles.

132

Page 145: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

entriesgetmarked.In general,this algorithm requiresd iterations,whered is the depthof the

call graph. Although d is not expectedto be large, the weederimplementsanoptimizationto reducethenumberof iterationsfurther: insteadof writing out allunmarked graphlog entriesimmediately, it holdsasmany aspossiblein a fixed-sizememorybuffer, from which they areretrieved andprocessedif they becomemarked. Unmarked entriesare thenwritten out only whenthe buffer overflows.How big to make the memorybuffer is strictly a time-spacetrade-off: the largerthe buffer, the fewer iterationsare required. We report on our experiencefor aparticularbuffer sizein Section9.5.

Thesecond,moresubtleissueinvolvesthesynchronizationbetweentherepos-itory, functioncache,andweeder. Sinceweedingmayrequireanoticeableamountof time, the interfacesbetweentheweederandtheothertwo processeshave beendesignedso that weedingcan proceedconcurrentlywith normal builds. Thereareseveral aspectsto this problem,but herewe will focuson themostimportantone,namely, theprotectionof recentlycreatedderivedfilesandcacheentriesfromthe weeder. This descriptionomits several importantdetails,but it describestheessenceof thedesign.

Theweeder’s first actionis to checkpointthegraphlog; its remainingprocess-ing usestheresultingfile. Hence,theweederoperatesonasnapshotof thefunctioncall graph.Careis requiredbecausesomeevaluationsmayhave beenin progressat thetime thesnapshotwastaken.

According to the weeder’s marking algorithm, a derived file or cacheentryis protectedfrom weedingby virtue of a path in the graphlog from one of theuser-specifiedweederrootsto thederivedfile or cacheentry. But if anevaluationwas in progresswhen the snapshotwas taken, any new entriescreatedby thatevaluationarenotyetreachablefrom aroot,sincetherootentryis notcreateduntiltheevaluationcompletes.Hence,if no specialactionis taken,thesenewly createdentrieswouldall beweeded.

To prevent this problem, all newly createdcacheentriesare automaticallyleased[20, 36]. We usethe term leasein a broadsense:a leaseis an agreementbetweentwo or morepartiesthat remainsvalid up to a prearrangedtime even ifthe partiesarenot in active communication.In this case,the evaluatoracquiresa leaseon newly createdcacheentriesthat is alsorecognizedby thecacheserverandtheweeder. Leasescanberenewed;theleaseonacacheentryis automaticallyrenewedwhenever aclientgetsacachehit on thatentry.

A leaseprotectsa cacheentry from weeding. A leaseon a cacheentry alsoprotectsall cacheentriesand derived files reachablefrom that cacheentry. Atweedingtime, the cacheinforms the weederwhich entriesarecurrently leased,andtheweederadditionallyusesall suchentriesasrootsof its markphase.

133

Page 146: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Oncetheweederhascommunicatedto thefunctioncachewhich cacheentriesto delete,any lookup on an entry pendingfor deletionis treatedasa miss,evenif the entry still exists in the cache. Hence,thereis a small possibility that theevaluatormayhave to re-evaluatea functionwhosevalueis still in thecachebutwasaboutto bedeleted.Therewouldbenopoint in addingmachineryto keepthisrareandharmlesseventfrom occurring.

All leaseseventually expire. If leasesdid not expire, then no cacheentriesor derived files could ever be deleted! We have usedleasedurationsas long asthreedaysduringVestadevelopment,to allow for situationswheretheevaluatorisstoppedin thedebuggerfor a dayor two, andasshortasthreehoursin a produc-tion environmentwheredisk spacewasin shortsupply. For efficiency, our leaseimplementationmayactuallyprotectentriesfor up to twiceaslongasthenominalleaseduration.Whenchoosinga leaseduration,it is bestto erron theconservative(long) side;anunduly long leasecausesno problemsexceptfor preventingsomefilesandcacheentriesfrom beingdeletedassoonasthey couldhave been.

134

Page 147: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Chapter 9

Performance

This sectiondescribesVesta’s performance.We first comparethe overall systemperformanceto thatof Make, focusingon the resourcerequirementsof theVestacomponentsthattypically run on usermachines,namelytheevaluatorandruntoolserver. We then presentmore detailedperformancemeasurementsof the Vestacomponentsthat typically run on a centralserver machine,namelytherepository,functioncache,andweeder. Wealsodescribetheperformanceof ourremoteproce-durecall (RPC)library, which is usedfor communicationamongthecomponents.

Oneof Vesta’s maindesigngoalsis that it shouldscaleto theconstructionofvery largesoftware(seeSection3.3). Unfortunately, at thetime themeasurementsin this chapterwere taken, the loadswe had placedon Vestawere quite smallcomparedto thosefor which it wasdesigned.For example,we hadnever storedmorethanabout10,000entriesin thefunctioncacheatany onetime,two ordersofmagnitudelessthanthedesigntarget. In addition,thesystemhadhadatmostfoursimultaneoususers,oneto two ordersof magnitudelessthanthe designtarget.1

Hence,wedonothaveany performancemeasurementsdirectlyattestingto Vesta’sscalability.

However, all of the Vestacomponentshave beendesignedand implementedwith an eye toward goodscalability. Hence,we have reasonto hopethat asthesystemis put underincreasedload, any unforeseenperformancebottlenecksthatarisewill notbeshow-stoppers.2 In thissection,wearguefor Vesta’sscalabilitybyextrapolatingfrom our currentperformancemeasurementsandby describingthemeasureswehave takento avoid scalabilitybottlenecks.

1Later, a groupof about130developerswithin CompaqadoptedVestaandusedit to develop acodebaseof about700,000linesovera2.5yearsperiod.Unfortunately, wedonothaveperformancemeasurementsfor their installation.

2This hasproventruesofar with our productdevelopmentusers.

135

Page 148: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

9.1 Hardware Configuration

The measurementsdescribedin this chapterwere performedwith the standardVestasystemconfiguration.In this configuration,theevaluatorandruntoolserverrun on a single client machine,and the repositoryand function cacherun on asingleserver machine.For theMake experimentsdescribedbelow, we ranMakeon the client machine,using files served by a standardin-kernel NFS version3implementationrunningon the server machine. In both setsof experiments,theunderlyingfilesbeingservedby therepositoryandNFS3serversweremountedona local Digital AdvFSfile systemusingtwo directly attachedDigital RZ28SCSIdisks.

The client machineusedin our testswas a Digital AlphaStation500 5/333,with a 333 MHz CPU and192MB of memory. The server machinewasa Digi-tal AlphaStation400 4/233,with a 233 MHz CPU and192MB of memory. Thetwo machineswereconnectedby SRC’s local AN2 network, a prototypeof Digi-tal’sGIGAswitch/ATM high-bandwidthATM network. MoredetailsaboutVesta’scommunicationcostsaredescribedin Section9.6below.

Thesemachinesarenow two generationsold, soweexpecttheabsoluteperfor-mancein boththeVestaandMake caseswould besubstantiallybetteron modernhardware.

9.2 Overall SystemPerformance

In thissection,wereportourmeasurementsof theoverallperformanceof theVestasystemfrom auser’s pointof view.

First, we comparethe performanceof Vestaagainstthat of Make [17]. Ourmeasurementsshow thatVestaoutperformsMake onbothscratchandincrementalbuilds. In themostcommoncase,anincrementalbuild in whichasmallnumberoffilesarerecompiled,VestasubstantiallyoutperformsMake. This improvedperfor-mancecomesdespitethefactthatVestaprovidesconsiderablystrongerconsistencyguaranteesthanMake does.

Second,wepresentmoredetailedmeasurementsof Vesta’sclientperformance.Our measurementsshow that the Vestacachingmachineryperformsquite well.The cost of incrementalbuilds is indeedproportional to the magnitudeof thechange,not to the total sizeof the softwarebeingbuilt. Our measurementsalsoconfirmthatVestaspendsmostof its time in runtoolcalls,indicatingthattheover-headinducedby theevaluatorandfunctioncacheis low.

Finally, we characterizethe memoryusageof the evaluatorand the runtoolserver—the two Vestaprocessesthatareexpectedto run on client machines.The

136

Page 149: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

measurementsshow that the run-timememoryconsumptionof theseprocessesisreasonable:Vestadoesnot compromisetheperformanceof theotherapplicationson theclient machinebecauseof its memoryconsumption.

TocompareMakeandVesta,weranthreetests.TheHello testrequirescompil-ing andlinking a toy “hello world” program.TheEvaluatortestrequiresbuildingtheVestaevaluator, which in turn requiresbuilding all of theVestalibrariesusedby theevaluator. Finally, theReleasetestentailsbuilding theentireVestarelease,which includesbuilding all of the Vestalibraries,server programs,utilities, andtestprograms.

Table9.1summarizesvariousattributesof thethreetests.Whencountingthetotal sourcelines,we treatedall theC, C++, andheaderfiles in the Vestaimple-mentationas sourcefiles. Whencountingthe numberof modules,we includedonly theC files, not theheaderfiles. Thenumberof runtool calls reportedin thetableis thenumberof timesanexternaltool invocationis requiredduringthetest;it includesinvocationsof compilers,linkers,andarchivers.Finally, thenumberofpackagesreportedis the numberof separatepackagesaroundwhich the sourcesareorganized.

Total Numberof Numberof NumberofTest SourceLines Modules Runtools PackagesHello 10 1 2 1Evaluator 53,304 103 117 11Release 119,602 255 333 16

Table9.1: Sizesof theThreeBuild Tests

Eventhelargesttest’s sizeis well below thatof thesoftwaresystemsfor whichVestawas designed. In interpretingthe resultsof our measurements,we makeargumentsaboutVesta’sscalabilitywhenit is possibleto extrapolatefrom thedatapresented.

9.2.1 PerformanceComparison with Make

In thissubsection,wecomparetheperformanceof VestaandMakefor bothscratchandincrementalbuilds. As explainedin Chapter2, Make doesnot alwaysbuildconsistentsoftwaresystemsasVestadoes.Thecombinationof MakeandMakede-pend,thoughabetterapproximationto Vesta,still fallsshortof providing thesameconsistency guarantee.Figures9.1and9.2comparetheperformancesof VestaandMake on scratchand incrementalbuilds, respectively. Using MakedependwithMakewouldaddanontrivial amountof timeto theMaketimesin thetables,which

137

Page 150: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Time (secs)

Hello

3.3 3.4

Evaluator

310 318

Release

912960

Vesta

Make

Figure9.1: Elapsedtime in secondsfor scratchbuildsof thethreetestsusingVestaandMake.

wouldbequitesignificantin thecaseof incrementalbuilds.As shown in Figure9.1,Vestawasslightly fasterthanMake on scratchbuilds

of all threebuild tests.EventhoughVestaoutperformsMake on scratchbuilds, itis worth noting that scratchbuilds occur lessoften underVesta. Make usersof-ten initiate scratchbuilds becausebuilding a systemfrom scratchis theonly wayto guaranteeconsistency with Make. The needfor suchscratchbuilds is elimi-natedunderVesta,becauseVestaalwaysproducesconsistentincrementalbuilds.However, if a userchangesa low-level headerfile that is usedin many places,anear-scratchbuild will resultin which many files arerecompiled.Figure9.1 indi-catesthatVestaperformsquitewell on suchbuilds.

As shown in Figure9.2,VestaclearlyoutperformsMakeonincrementalbuilds.To testincrementalbuild performance,we touchedasinglesourcefile andrebuilt.In eachtest,this requiredoneinvocationof thecompilerandoneinvocationof thelinker, sowe estimatethatthetotal time spentrunningtoolswasroughlythesameunderVestaandMake. Figure9.2 shows theelapsedincrementalbuild timesforthethreetests.

Whenaprogramis composedof sourcesspanningmultiplepackages(asoccursin theEvaluatorandReleasetests),developersusingMaketypically runMakeonlyin thepackageon which they areworking; thesearethe timesreportedunderthe“Make One” column. However, suchincrementalbuilds canleadto inconsistentresultsif filesin any of theotherpackageshavechanged.Togeta(more)consistentbuild, Make can be run in all of the contributing packages;theseare the timesreportedunderthe “Make All” column. In this incrementaltest,however, we didnot modify any of theotherpackages,so thedifferencebetweenthe“Make One”and“MakeAll” columnsis simplythetimerequiredby Maketo determinethatthe

138

Page 151: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Time (secs)

Hello

3.3 3.4 3.4

Evaluator

12.515.1

23.3

Release

13.115.1

32.1

Vesta

Make One

Make All

Figure 9.2: Elapsedtimes in secondsfor one-module,incrementalbuilds under Vestaand Make. The “Make One” column is the timerequiredto run Make in the single packagein which the changewasmade,while the “Make All” columnalsoincludesthe time requiredtorunMake in eachof theprogram’sotherpackages.

otherpackageswereall up-to-date.We considerthe“Make All” scenarioto bemuchmoreof anapples-to-apples

comparisonwith Vestathanthe“MakeOne”scenario.However, neitherMakesce-nariocomescloseto matchingVesta’sconsistency guarantees,particularlybecausethey do not includethe time requiredto run Makedependin eachof the relevantpackages.

Exceptfor thetrivial Hello program,Vestaran86%to 145%fasterthanMakealone. We expect that Vesta’s performanceadvantagewill tendonly to increasewhenbuilding largersoftware,sinceincrementalbuilds underMake andMakede-pendrequiretime proportionalto thesizeof thesoftwarebeingbuilt, while incre-mentalbuildsunderVestarequiretimeproportionalto themagnitudeof thechange.This claim is confirmedby thefact thatVestaran86%fasterthanMake aloneontheEvaluatortest,but 145%fasteron thelargerReleasetest.

9.2.2 PerformanceBreakdown

In this subsection,we examinehow time wasspentby Vestain both scratchandincrementalbuilds.

We divide the total elapsedtime into threeparts: the time spentin the Vestaevaluatoritself, the time spentby theevaluatormakingremoteprocedurecallsontheVestafunctioncache,andthetime spentin externalruntool invocations.Notethat theoverheadsdueto the repositoryandthe tool encapsulationmachineryare

139

Page 152: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Time (secs)

Hello

0.5 0.6 2.2 3.3

Evaluator

10.1 15.2

285310

Release

29.6 41.8

840 912

Evaluator

Cache

Runtool

Total

Figure9.3: Breakdown of scratchbuild elapsedtimes in secondsforeachof Vesta’scomponents.

includedin theruntooltime. Wehave not analyzedtheseoverheadsin any greaterdetail.

For scratchbuilds, Figure9.3 shows that the time spentdoingevaluationandcachingfor non-trivial builds is no morethan8% of thetotal runningtime. More-over, thetimespentrunningtheevaluatorandcacheaccountsfor asmallerfractionof thetotal elapsedtime thelarger thesoftwarebeingbuilt. As yet largersoftwareis built with Vesta,we expectthis fraction to continueto declinesomewhat, andcertainlynever to increase.

The breakdown of the elapsedtime for incrementalbuilds is shown in Fig-ure9.4. It demonstratesthatVesta’s performancescaleswell, at leastfor medium-sizedprograms.EventhoughtheReleasesourcesaremorethantwice thesizeoftheEvaluatorsources,theabsolutetimesspentin theVestaevaluatorandfunctioncachearesmallandnearlyidenticalacrossthetwo builds. This resultsupportsourclaim that thetime spentin theVestasystemproperis proportionalto themagni-tudeof thechange,not to thetotal sizeof thesoftwarebeingbuilt.

9.2.3 CachingAnalysis

In Chapter7, we arguedthat cachinguser-definedfunction calls is necessarytoachieve goodoverall systemperformance.Cachinguser-definedfunctioncallshasits costs(mostnotably, thecostof computingeachfunction’s fine-graindependen-cies).In thissectionweshow thatthebenefitsoutweighthecostsin practice.

Tomeasuretheeffectivenessof cachinguser-definedfunctions,weranourtestswith threedifferentlevelsof caching.At thelowestlevel, we cachedonly runtoolcalls. This level of cachingis essentiallywhat Make provides,but with Vesta’s

140

Page 153: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Time (secs)

Hello

0.5 0.6

2.23.3

Evaluator

0.8 0.8

10.9

12.5

Release

1.1 1.3

10.7

13.1

Evaluator

Cache

Runtool

Total

Figure9.4: Breakdown of one-module,incrementalbuild elapsedtimesin secondsfor eachof Vesta’scomponents.

strongconsistency guarantee.At thenext level, we cachedbothruntoolcallsandmodelcalls.3 Modelsarea naturalcachingboundarybecauseevery packagebuildis performedby evaluatingthepackage’s rootmodel.Finally, we reportonVesta’sdefault behavior, which is to cacheruntool calls, modelcalls, andcalls of user-definedfunctions.Figure9.5shows theresultsof doing incrementalbuilds of thethreetestswith thesethreecachinglevels; thefinal columnshows the“Make All”elapsedtime from Figure9.2for comparison.

Wecandraw threemainconclusionsfrom thisdata:

o Cachingall function calls is substantiallyfasterthancachingonly runtoolcalls or runtool andmodelcalls together. We expectthat this performancedifferencewill becomeevenbiggerfor largersoftware.

o Cachingonly tool invocationsis clearly not sufficient. Whenonly tool in-vocationsarecached,the costof doing incrementalbuilds is proportionalto thesizeof thesoftwarebeingbuilt, not to thenumberof tool executionsrequired. This is especiallyevident in the caseof the Releasetest,wherecachingonly runtoolcallsled to somany requestsonthefunctioncachethatthebuild timewasanorderof magnitudelarger thanwhenall functioncallswerecached.

o Although cachingmodel calls in addition to runtool calls doeshelp, sub-stantialperformancebenefitsoccurwhencallsto user-definedfunctionsarealsocached.Again, the Releasetestdemonstratesthis point well: caching

3Recallthata Vestamodelrepresentsa one-argumentclosurethatcanbecalledfrom otherfunc-tionsandmodels.

141

Page 154: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Time (secs)

Hello

3.3

10.1 11.5

3.4

Evaluator

12.5

21.5 21.3 23.3

Release

13.1

22.2

126

32.1

All Calls

Runtools & Models

Runtools Only

Make All

Figure9.5: Elapsedtimesof one-module,incrementalbuildsin secondswith differentVestacachinglevels. For comparison,the final columnshowstheincrementalbuild timeunderMake.

modelsin additionto runtoolcallsled to afactorof 5.5speedup,but cachinguser-definedfunctioncallsaswell yieldedanotherspeedupfactorof nearly2.

Together, thesepoints show that the benefitsof performingfine-graineddepen-dency analysisclearlyoutweighits cost.

Recallfrom Section7.5 that in additionto thenormalcacheentrycreatedforeachmodel evaluation, the evaluatoralso createsa specialcacheentry. Thesespecialcacheentriesyield fastercachehits, and the overheadrequiredto createthemis quite small. In the testsabove, suchspecialcacheentriesaccountedforall the cachehits for model evaluations. This result suggeststhe usefulnessofthesespecialcacheentries,althoughwe have not measuredthe system’s overallperformancewithout them.

9.2.4 ResourceUsage

ThissubsectionreportstheCPUusageof themaincomponentsof theVestasystem,as well as the memoryusageof client processes.More detailedmeasurementsaboutthefunctioncacheandrepositoryarepresentedin Sections9.3and9.4below.

CPU Usage

To measurethe CPU usage,we wrote a simplescript to invoke ps(1)onceeverysecond.Weusedthisscriptto monitortheevaluator, theruntoolserver, thefunctioncache,andthe repositoryduringbothscratchandincrementalbuilds. During the

142

Page 155: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

CPU Load (%)

Scratch

2.1

0.07

1.8

16

Incremental

4.7

0.01

1.4

13Evaluator

Runtool

Cache

Repository

Figure9.6: ThemeanCPUloadsof theVestaevaluator, runtoolserver,function cacheserver, and repositoryserver during scratchand incre-mentalbuilds.

builds, therewasnootherbuild in progressthatwasusingthesamefunctioncacheor repository, i.e., theVestasystemwasbeingusedby asingleclient.

The evaluatorand the runtool server are run on the client machine. So, theaverageCPU overheadon the client machineis below 5%. We believe that theCPU load of thesetwo programsshouldremainabout the samewhen buildinglargersoftware.

Thefunctioncacheandrepositoryserverswererunonasinglededicatedservermachine,althoughthey couldberunonseparateservermachinesaswell. If multi-ple builds wereperformedsimultaneously, theCPUloadswould behigher. Basedonthedatain Figure9.6,Vestarunningontherelatively old hardwareusedin theseexperimentscould adequatelysupport50-100developers,assumingthe functioncacheandrepositorywererun on separateserver machinesandthatnot all devel-operswouldbedoingbuildssimultaneously. Morepowerful servermachinescouldbeusedto supporta largernumberof developers.In fact,aswe reportin thenextchapter, Vestahasbeenusedby a teamof 130engineersdevelopingacodebaseofapproximately700,000lines,andit heldupwell underthatload.

Client ProcessMemory Usage

In thetypical Vestasystemconfiguration,theevaluatorandthe runtoolserver arebothrun on theclient machine.Thememoryusageof theruntoolserver is alwayswell below 2MB. Theevaluator, on theotherhand,canconsumesomewhatmorememoryduringa large scratchbuild. To measurethememoryusageof theeval-uator, theevaluatorcalls ps(1)right beforeit exits. Figure9.7 givesthe memory

143

Page 156: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Mem Usage (MB)

Hello

6.2 6.2

N/A

Evaluator

8.7

6.2 6.3

Release

12.3

7.3 7.3

Scratch

1 File

5 Files

Figure 9.7: The run-time memory usageof the Vestaevaluator onscratchandincrementalbuilds. Thelasttwo columnsin eachcaseshowtheincrementalresultsin which 1 file and5 files werechanged,respec-tively. (The5-file changetestis notapplicablein theHello casebecausethatprogramconsistsof only a singlefile.)

consumptionof the evaluatorfor both scratchand incrementalbuilds of the testprograms.

How canthesememoryrequirementsbe expectedto scalefor larger builds?For incrementalbuilds, we expect to seesimilar memoryconsumption,even formuch larger software. As explainedin Section9.2.2, the costof an incrementalbuild is determinedby themagnitudeof thechange.Sincetheevaluator’s memoryconsumptionis roughly proportionalto the sizeof the call graphit mustevalu-ate, the memoryrequiredby the evaluatorto do an incrementalbuild is almostcompletelyindependentof thesizeof thesoftwarebeingconstructed.For scratchbuilds,however, it still remainsto beseenwhethertheevaluatorwill performwellwhenVestais usedto build multi-million line softwaresystems.

The memoryrequirementsreportedabove may be artificially high dueto thefact that theevaluatorusesa garbagecollectorfor its memorymanagement.Thecollectorusesheuristicsto decidewhenit shouldgrow its heap,andwe have ob-servedthecurrentcollectorgrowing theheapunnecessarily. We believe we coulddecreasetheevaluator’s memoryusagefor large scratchbuilds by tuning thecol-lector’s heuristics.However, doingsowouldprobablyleadto anincreasednumberof collections,which in turn might leadto somewhatslower evaluations.

9.3 RepositoryPerformance

Wedescribeseveralaspectsof therepositoryperformancein this section.

144

Page 157: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

First, we measurethe speedof basicfile operationsthroughthe repository’sNFS server interface. This aspectof repositoryperformanceis most importantduringbuilds,whenencapsulatedtoolsmake intensiveuseof therepositoryto pro-vide file service. If the repositoryis fastenoughto servicebuilds adequately, itwill easilymeetthe lighter demandsplacedon it by usersbrowsing throughtheappend-onlysourcetreeor editing files in mutablesourcedirectories.Our mea-surementsshow that repositoryperformanceis adequate.In comparisonson thesamehardware,therepositoryshows a write datarateof about96%of a standardNFSserver, a readdatarateof about55%,andcomparableperformanceon otheroperations.

Second,weexaminetherepository’s memoryanddiskspaceconsumption.Wepresentmeasurementsof thecurrentamountof sourcecodestoredin therepository(numberof packages,versions,files,etc.),togetherwith abreakdown of theactualmemoryanddisk spaceusedby theserver for variouspurposes.Our approachofkeepingdirectorystructuresin memoryappearssuccessful:a nontrivial but rea-sonableamountof memoryis used,andmemorywill notgrow unreasonablyastheamountof sourcecodeincreases.Ourapproachto file versioningappearssuccess-ful too: eventhoughwe do not usedeltasor otherformsof compression,derivedsstill take up considerablymoredisk spacethansources,andthe spaceconsumedby old versionsof sourcesis inexpensive at today’s diskprices.

Finally, we measurethe performanceof the repository’s developmentcycletools: vcheckout, vadvance, vcheckin, etc. Thesetoolsall run very fast(roughly0.5to 1.5elapsedsecondsfor thecasestested),makingthesystempleasantto use.

9.3.1 Speedof File Operations

We measuredthe speedof the repositoryasa file server on two commonlyusedfile systembenchmarks:theConnectathon’97 BasicBenchmark(CBB) [15] andtheModifiedAndrew Benchmark(MAB) [28, 40]. CBB is amicrobenchmarkthattestssmall groupsof relatedoperations.MAB is a higher-level benchmarkthatmeasuresperformanceon a softwaredevelopmenttask. Neitherbenchmarkpro-videsa measurementof file server performancein isolation;bothmeasureoverallfile systemperformance,includingthebuffer cacheof theclient operatingsystem.Both (especiallyMAB) canat timesbeCPU-bound,not I/O-bound.Nonetheless,thebenchmarksprovide a roughbasisfor comparisonbetweentherepositoryandanordinaryNFSfile server.

For thesemeasurements,weusedthestandardVestaconfigurationdescribedinSection9.1.Thebenchmarkseitheraccessedamutabledirectoryin theuser-spacerepositoryserver via NFS version2, or (for comparison)accesseda directory intheunderlyingAdvFSfile systemthroughthestandardkernel-spaceNFSversion

145

Page 158: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

3 server. In bothcases,accesseswereover SRC’shigh-speedATM network.

Test Description AdvFS+NFS3 Vesta+NFS21 file anddirectorycreation: 6.13 (0.19) 7.32 (0.38)

creates155filesand62 directories.2 file anddirectoryremoval: 5.56 (0.19) 6.53 (0.36)

removes155filesand62directories.3 lookupacrossmountpoint: 1.35 (0.07) 1.37 (0.04)

500getwdandstatcalls.4 setattr, getattr, andlookup: 11.38 (0.17) 3.04 (0.42)

1000chmodsandstatson 10files.4a getattrandlookup: 0.11 (0.02) 0.01 (0.03)

1000statson 10 files.5a write: 5.88 (0.57) 6.26 (0.82)

writesa1048576-bytefile 10 times.5b read: 1.40 (0.02) 2.54 (0.08)

readsa1048576-bytefile 10 times.6 readdir: 5.28 (0.18) 7.27 (0.24)

reads20500directoryentries,200files.7a rename:200renameson 10files. 3.51 (0.13) 6.75 (0.34)9 statfs:1500statfscalls. 1.16 (0.04) 1.18 (0.17)

Table9.2: ConnectathonBasicBenchmark,run on a standardfile sys-tem throughNFS version3, andon the VestarepositorythroughNFSversion2. Eachtableentryis anaverageelapsedtimein seconds;smallernumbersarebetter. All valuesareaveragedover 20 runsof thebench-mark.Standarddeviationsareincludedin parentheses.

On the ConnectathonBasicBenchmark(Table9.2), the repositoryis slightlyslower thanthekernelNFSserveronmostoperations,but therearesubstantialdif-ferenceson a few. Datawritesandreads(tests5aand5b)arethemostinteresting.Converting theelapsedtimesto bytespersecondandcomparing,we find that therepository’s write datarateis 96%of kernelNFS,while its readdatarateis 55%.

Wehavenotinvestigatedthereasonsfor thesedifferencesin detail,but it seemslikely thatwritesappearartificially fastunderVestabecausetherepositorycheats,asexplainedin Section4.3.5.TherepositoryservercurrentlyviolatestheexpectedNFS2semanticsby not forcing writesall theway throughto disk beforereturningto the client, so the benchmarkdoesnot get the expectedwrite-throughon closesemanticsat userlevel. ThekernelNFSserver, on theotherhand,implementstheNFS3protocolcorrectlyandthusdoesprovide write-throughon close.

146

Page 159: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Readslikely areslow becausethe datais really beingreadfrom disk, so theextra overheadof going throughthe user-spacerepositoryserver is fully visible.TheConnectathonreadtestflushestheclient machine’s buffer cachebeforeeachread.

Curiously, the repositoryis a greatdealfasterthankernelNFSon tests4 and4a. This maybedueto therepository’s in-memorydirectorystructure,but in anycasethedifferenceis probablyunimportantfor overall systemperformance.

Tests7band8 of thebenchmarkwereomittedbecausethey testhardlinks andsymboliclinks, neitherof whicharesupportedin repositorymutabledirectories.

Phase Description AdvFS+NFS3 Vesta+NFS21 CreateDirectories 947 (175) 756 (116)2 Copy Files 9286 (366) 5776 (280)3 DirectoryStatus 3609 (56) 3750 (90)4 ScanFiles 4393 (189) 4414 (112)5 Compile 22627 (536) 18913 (630)

Total 40862 (654) 33609 (840)

Table9.3: Modified Andrew Benchmark,run on a standardfile systemthroughNFSversion3, andontheVestarepositorythroughNFSversion2. Eachtableentry is anaverageelapsedtime in seconds;smallernum-bersarebetter. All valuesareaveragedover 20 runsof thebenchmark.Standarddeviationsareincludedin parentheses.

Thefirst phaseof theModified Andrew Benchmark(Table9.3) createsa treeof directories.Thesecondphasecopiesa 350KB collectionof C sourcefiles intothetree.Thethird phasetraversesthenew treeandexaminesthestatusof eachfileanddirectory. The fourth phasereadsevery file in the new tree. The fifth phasecompilesandlinks thefiles. This benchmarkdoesnot usetheVestaevaluatorortool encapsulation;herewe arecomparingonly theperformanceof therepositoryasafile server, not theoverall Vestasystemperformanceon softwarebuilding.

On this benchmark,surprisingly, the repositoryis actually somewhat fasterthanthe kernelNFS server in somephasesandin the overall time. We have notdonethedetailedmeasurementsthatwould beneededto find out for certainwhythis happens.It seemslikely that someof the difference(especiallyin Phase1)is dueto the repository’s in-memorydirectorystructure,andsome(especiallyinPhases2 and5) is dueto thelackof write-thoughon closediscussedabove.

Our purposein makingthesemeasurementswasnot to understandthereposi-tory’s NFSperformancein detail; it wassimply to establishthat theperformanceis adequatefor our purpose.Themeasurementsconfirmthis claim. Even though

147

Page 160: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

readingfrom the repositoryserver takesalmosttwice aslong asreadingfrom anin-kernelNFSserver, theactualelapsedtime to do builds is slightly less.Both thesmall, non-encapsulatedbuild measuredin theAndrew benchmarkandtheVestabuilds measuredin Section9.2.1show thiseffect.

In thefuture,wemayinvestigatethepossibilityof speedinguprepositoryNFSreadsandwrites by moving theminto the kernel. The repositoryserver doesnousefulwork on theseoperationsotherthanmappingfrom its file handles(longids)to the correctfiles in the underlyingfile system. It would be straightforward tomove this functionality into the kernel, getting the user-spaceserver out of theloop for thesetime-criticaloperations.All otherrepositoryNFSoperationswouldcontinueto behandledin userspace,minimizing theamountof codeaddedto thekernel. However, one big advantageto the currentdesignis that it is far moreportable.

9.3.2 Disk and Memory Consumption

For this section,we took a snapshotof our working repositoryasof October27,1997,andmadea variety of measurementson its contents. We wereusing thisrepositoryto developtheVestasystemitself. It alsocontainedafew othersoftwarepackagesthatwe hadconvertedto bebuilt underVestaastestcasesfor therepos-itory andevaluator. In particular, it includedtheJuno-2constraint-baseddrawingeditor[27, 29] andthemany Modula-3[38, 39] librariesrequiredby Juno-2.

Our measurementsdo morethanindicatehow muchdisk andmemoryspaceis requiredto storethe currentsnapshot;by extrapolationthey alsogive a usefulestimateof how spaceconsumptionwill grow asmorepackagesandversionsareaddedto the repository. We discussthis extrapolationafter presentingthe basicfigures.

The snapshotcontains69 top-level packagesand26 branches.Within these,thereare648checked-in versionsand10 reservation stubsfor versionsunderde-velopment.Thereare631checkout sessions;this includesbothactive sessionsas-sociatedwith thereservationstubs(10),andold sessionsassociatedwith checked-in versions(621).Within thesecheckoutsessionsthereare4,981packageversions,for agrandtotalof 5,629versions.

Summingover the entire snapshot,including both checked-in versionsandcheckout sessions,thereare 24,556directoriesand 407,662files. That is, thesnapshot’s externallyvisible hierarchicalnamespaceincludesthatmany distinctlynameddirectoriesandfiles. In the repository’s internalDAG structure,many ofthesedirectoriesandfiles sharestorage.If all 407,662files werestoredwithoutsharing,they would occupy 12,076,631,120bytes,or (consideringfragmentation)12,018,1301-KB blocks(about11.7GB).

148

Page 161: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

The mutableportion of the repositorywasalso includedin the snapshot.Itcontains29 directoriesand449files; again,someof thesefiles maysharestoragewith eachotheror with immutablefiles. If storedwithout sharing,thefiles wouldoccupy 2,9861-KB blocks(about2.9MB).

Therepository’s internalsharingof storagebetweenidenticalfilesshrinksthesenumbersgreatly. In reality, only 10,856distinctsourcefilesexist in thesnapshot’sshortidpool,occupying only 269,0301-KB blocks(about263MB).

How much disk spaceis beingspenton old versionsandbranches?To getanapproximateanswerto this question,we look at only thelatestversionof eachpackageandeachbranch.Thereare66 latesttop-level versions(not 69, becausea few packageshave no versions),below which are445directoriesand4,367im-mutablefiles; if storedwithout sharing,thesefiles would occupy 93,3911-KBblocks(about93 MB). Sinceonly oneversionfrom eachpackageis countedhere,thereis in factlittle or nosharing,sowecantakethesefiguresastheactualamountof spaceconsumed.Thus269p 030 q 93p 391 r 175p 6391-KB blocks(about172MB) areconsumedstoringold versionsandbranches.To summarize,about35%of thesourcedisk spaceis spentstoringlatestversions,andabout65%is spentonold versionsandbranches.

Thesnapshotwastaken immediatelyaftera weedthat keptonly thederivedsproducedby evaluatingthe latestversionof eachpackage,branch,andcheckoutsession;all otherderivedsweredeleted.This weedleft the snapshotwith 2,958derived files. (Somefiles canbe both sourcesandderiveds;we omit thosefromthiscount,includingthemonly assources.)Thesederivedsoccupy 288,3671-KBblocks(about282MB). Thus,acrosstheentirepoolof sourcesandderiveds,about17%of thespaceis consumedby latestsourceversions,31%by old versionsandbranches,and52%by deriveds.

Of course,if thesnapshothadnot beentaken just aftera weed,theproportionof derivedswould bemuchhigher. We tendto wait until our disk (about4 GB) isnearlyfull beforerunninga weed.At this point over 93%of thedisk is occupiedby deriveds.

Becausetherepositorykeepsall of itsdirectoriesin memory(seeSection4.3.2),wealsomeasuredtheamountof memoryrequiredto storethesnapshot.Therepos-itory’s packed, garbage-collectedmemorypool used3.0 MB to storeimmutabledirectories,0.34MB to storeappendableandmutabledirectories,0.44MB to storemutableattributes,anda few tensof kilobytesfor otherstructures.A checkpointthatwastaken just beforethesnapshotis 3.77MB long; this checkpointis anex-actimageof therepository’s runtimememorypoolandencodesits completestate.In addition,0.24MB of therepository’s general-purposeheap(memoryallocatedby new in C++) wereconsumedby an invertedindex that mapsfrom immutabledirectoryshortidsto the in-memorydirectorydatastructures(seeSection4.3.1);

149

Page 162: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

this index wasnot includedin the checkpointbecauseit canbe reconstructedatrecovery time. Thus,4.0 MB wereusedto store24,585directories(andrelatedstructuressuchasattributes).Theaveragecostwasabout171bytesperdirectory.

Therepositorysavesmemoryaswell asdisk spaceby storingthehierarchicalnamespaceinternally asa DAG. Although thereare24,585directorieswith dis-tinct hierarchicalnames,internallythesnapshotcontainsonly 8,411directorydatastructures(7,560immutableand851appendableor mutable).Within thosedirec-tories,additionalspaceis savedby recordingdirectoriesaslistsof changesrelativeto otherdirectorieswherepossible.Although thereare432,696files anddirecto-ries with distinct hierarchicalnames,andhencean averageof 432,696/24,585=17.60entriesin eachexternaldirectory, the internaldirectorydatastructurecon-tainsonly 85,650entries,for anaverageof 85,650/8411= 10.18entriesperinternaldirectory. Internaldirectoriescanalsocontainplaceholderentriesfor objectsthathave beendeleted;thesnapshotcontained1,866of these,anaverageof about0.22perinternaldirectory.

An internaldirectoryconsistsof oneor moreblocks(usually just one),eachcontainingafixed34-byteheader, apackedlist of entries,and(for appendableandmutabledirectories)someoptionalspacefor expansion.Thesnapshotused8,502blocksto storethe8,411directories,for a total of 0.28MB of header(anaverageof 34byteseach),andatotalof 0.21MB of expansionspacein the851appendableandmutabledirectories(anaverageof about258 byteseach).An entry containsa pathnamearc plus either10 or 26 bytesof overhead(dependingon whetheritpointsto a directoryor file, respectively). Thesnapshotuseda total of 2.8MB tostoreactivedirectoryentriesandanadditional29.0KB to storeplaceholderentriesfor deletedobjects.Theaveragecostwas33.78bytesperentry.

Steppingbackfrom thesedetails,whatdowelearnabouttherepository’s over-all memoryand disk consumption,and how can we expect it to grow as moresourcesarestored?

Disk consumptionfor sourcesis proportionalto thenumber(andsize)of dis-tinct sourcefile versions.After aboutayearof developingVestaitself underVesta,the dataabove shows abouta threefoldexpansionin disk storageby storing allpackageversionsinsteadof keepingonly themostrecentversionof eachpackage.Evenwith this expansion,andevenaftera weedthatkeptonly oneversionof thederiveds,derivedsoccupy moredisk spacethansources.Of course,we would ex-pectthe factorof threeto grow a bit worseon a sourcepool thatwasworked onfor a longertime andhadmoreactive branches.Still, consideringtoday’s rapidlyfalling disk prices,andthefactthathumansarenot learningto typein new sourcecodeat correspondinglyhigher rates,we believe that the disk spacerequiredto

150

Page 163: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

storeold sourceversionsis notaproblemin Vesta.4

Memoryconsumptionis moreof aconcern,but hereagainit is roughlypropor-tional to thenumberof distinctsourcefile versions.Eachnew sourcefile versionaddedto therepositoryrequiresatmostasmallconstantnumberof directoriesanddirectoryentriesto beaddedto thedatastructure.A conservativeestimate,roughlyvalid if a separatevadvanceis usedto inserteachnew file version,is onenew di-rectoryandtwo new directoryentriespernew file version,for a total of about100bytes.Thus,a repositoryserver machinewith 200MB of physicalmemory(notatall unreasonableat today’s memoryprices)could hold thedirectorystructureforsometwo million sourcefile versions.Moreover, therepositorywill still run cor-rectlywith lessphysicalmemory;it will simplyrunmoreslowly dueto paging.Asmemorypricescontinueto drop, larger repositorieswill becomefeasible. Therearealsoa few opportunitiesto shrink the presentdatastructures:eliminatingtheper-directoryexpansionspacewould save almost10%with essentiallyno lossinperformance,5 andmoresavingswouldbepossibleby tradingtime for space.

9.3.3 Speedof RepositoryTools

Toevaluatethespeedof therepositorytools,weranasimplebenchmark.Eachstepof the benchmarkappliedthesameoperationonceeachto 10 separatepackages.Everyrepositorytool tookwell under1.5secondsto run,andmosttook0.5secondsor less. Stepsthat copiednew sourcecodeinto the repository, or modified anexisting sourcefile for thefirst time (triggeringa copy-on-write), took longerbutwerein line with theperformancemeasurementsof Section9.3.1above.

Herearethestepsof thebenchmarkin detail.

1. Runvcreateto createanew package.

2. Runvcheckout to createacheckout sessionfor theemptypackage.

3. Copy the repository’s own sourcecodefrom an ordinary (NFS-mounted)file systeminto thenew workingdirectoryfor thepackage.Thesourcecodeconsistsof 92 files,containing835,120bytesor 8761-KB blocks.

4. Runvadvance to install thesourceasthefirst versionof thecheckout ses-sion.

4In the Vesta-1repository, we implementedan optionalfeaturethat could compressold sourceversionsby encodingthemasdeltas,similar to the representationusedby RCS.Whenwe put thesysteminto daily use,however, we found that 80% of our disk spacewas typically occupiedbyderivedfiles[13]. Thus,delta-compressingthesourcefileswouldhavesavedonly asmallpercentageof ouravailablediskspace,soweneverbotheredto turnonthefeature,andweomittedit from Vesta-2 entirely.

5In fact,we madethis changein a laterversionof Vestathantheonemeasuredhere.

151

Page 164: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

5. Runvadvanceagain.In this casevadvancedetectsthat theworking direc-tory hasnotchangedanddoesnot createanew version.

6. Touch (modify) one file in the package,triggering a copy-on-write. Thechosenfile was79,357byteslong.

7. Runvadvance.

8. Touch(modify) all the files in the package,triggeringa copy-on-write forevery one.

9. Runvadvance.

10. Runvcheckin, installingthefinal versionfrom thecheckout sessionasver-sion1 of thepackageanddeletingtheworkingdirectory.

11. Runvcheckout on thepackage,creatinga new checkout sessioninitializedfrom version1.

12. Again touchonefile, sameasstep6.

13. Runvadvance.

14. Yetagaintouchonefile, sameasstep6.

15. Runvadvance.

16. Runvcheckin, installingthefinal versionfrom thecheckout sessionasver-sion2 of thepackageanddeletingtheworkingdirectory.

Theentirebenchmarkwasrun 5 times,eachtime on 10 packages.Theresultsareshown in Table9.4.Theaveragetime takenperpackageis givenfor eachstep,roundedto two significantdigits.

We arevery satisfiedwith the tools runningat this speed.Subjectively, theyaremorethanfastenoughto make thesystempleasantto use.

9.3.4 Speedof Cross-RepositoryTools

Several yearsafter taking the measurementsin the previous section,we repeatedthe benchmarkto comparethe performanceof the tools on the single-repositoryandcross-repository(Section4.4.9)cases.Whenthemeasurementsin thissectionweretaken,boththecomputersandthenetworksusedfor theearliermeasurementshadbeenreplacedwith fasterhardware.We omittedsteps14 and15 of theearlierbenchmarkin this run,sincethey arepurelylocal andduplicateearliersteps.

152

Page 165: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Step Description Timeperpackage1 createanemptypackage 260ms2 checkout thenew package 560ms3 copy in 835KB of sourcecode 12000ms4 advancethepackage 1400ms5 advanceagain(nochanges) 120ms6 toucha79 KB file 110ms7 advancethepackage 200ms8 touchall 92files in thepackage 7600ms9 advancethepackage 1300ms

10 checkin thepackage 470ms11 checkout thepackage 660ms12 toucha79 KB file 140ms13 advancethepackage 180ms14 toucha79 KB file 120ms15 advancethepackage 190ms16 checkin thepackage 460ms

Table9.4: Vestarepositorytool performance.The entirebenchmarkwasrun 5 times,eachtime on 10 packages.Theaveragetime takenperpackageis givenfor eachstep,roundedto two significantdigits.

Table9.5shows theresultsof thecross-repositoryperformancebenchmark.Inthis test,thestepslistedwererun in order, 50 timeseachon 50 separatepackages.Thetablegivestheaveragetime for eachstep,roundedto two significantfigures.TheLocalcolumnis thesinglerepositorycase,Nearbyis thecross-repositorycasewherethe local and remoterepositoriesare connectedby a single hop of giga-bit ethernet,and Distant is the cross-repositorycasewherethe repositoriesareon oppositecoastsconnectedby tenhopsthrougha corporateintranet. Note thatcopying, touching,and advancingare always local operations. Eachrepositorywas running on a 500 to 600 MHz Alpha 21164Aprocessor. In eachcase,thetoolswererun on a client workstationwith a 667MHz Alpha 21264Aprocessor,connectedto thelocal server by 100Mb ethernet.As thetableshows, thetoolsarevery fast in the local andnearbycases,andfastenoughto be usableeven in thedistantcase.

Comparingthefiguresfrom theolder testsof Table9.4 doneon slower hard-warewith the newer onesin Table9.5, the tools have speededup on most tests,but appearnotablyslower on advancetests4 and9. Therearetwo reasonsfor thedifference:first, the testsweredoneon a larger package(835 KB vs. 1204KB),

153

Page 166: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Step Description Local Nearby Distant1 createanemptypackage 50 ms 250ms 6400ms2 checkout thenew package 64 ms 590ms 5700ms3 copy in 1204KB of sourcecode 5300ms 5400ms 5200ms4 advancethepackage 2500ms 2600ms 2500ms5 advanceagain(no changes) 170ms 170ms 180ms6 toucha108KB file 34 ms 32 ms 30 ms7 advancethepackage 160ms 180ms 180ms8 touchall 92files in thepackage 3200ms 3300ms 3200ms9 advancethepackage 2500ms 2500ms 2600ms

10 checkin thepackage 110ms 780ms 18000ms11 checkout thepackage 49 ms 730ms 5800ms12 toucha108KB file 49 ms 73 ms 59 ms13 advancethepackage 150ms 160ms 160ms14 checkin thepackage 64 ms 170ms 4700ms

Table9.5: Vestarepositorytool performance,comparingthelocal (sin-gle-repository)casewith two cross-repositorycases,onewherethe re-moterepositoryis nearbyandtheotherwhereit is distant.

andsecond,themeasurementsin Table9.4weretakenbeforewe implementedfin-gerprintingfiles by content(Section4.2.3),so they do not includefingerprintingcosts.

9.3.5 Speedof the Replicator

Table9.6givesa few simplemeasurementsof theVestareplicatorvrepl describedin Section4.4.8.Likethecross-repositorytool measurements,thesemeasurementsweretakenseveralyearslaterthantheothersin thischapter, andboththecomputersandthenetworksusedfor theearliermeasurementshadbeenreplacedwith fasterhardware.For thefollowing measurements,eachrepositorywasrunningon a 500to 600MHz Alpha21164Aprocessor. In theNearbycases,thetwo repositoryhostmachinesweredirectly connectedvia gigabit ethernet. In the Distant cases,thetwo machineswereon oppositecoasts,connectedvia 10 hopsthrougha corporateintranet. The replicatoritself was run on a client workstationwith a 667 MHzAlpha 21264Aprocessor, connectedto the local server by 100Mb ethernet.As aroughpoint of comparison,the rcp columnsgive the time to copy the samefilesdirectlyoutof thenativefile systemthattherepositoryis built on topof. All timesareaveragedover threetrialsandroundedto two significantfigures.

154

Page 167: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

vrepl rcpReplicate Size Nearby Distant Nearby Distant+repos/124to emptyrepository 1.2MB 1.7s 45s 17s 57 s+repos/125with 124present 582KB 0.6s 14s 1.6s 14 s@repos/124to emptyrepository 127MB 140s 3200s 620s 2600s@repos/125with 124present 640KB 9.6s 120s 1.4s 13 s

Table9.6: Vestareplicatorperformance.

9.4 Function CachePerformance

9.4.1 Server Performance

Recall from Section6.3 that looking up an entry in the function cacheis a two-stepprocess.In thefirst step,theevaluatorinvokesthecache’s SecondaryNamesfunctionto learnthesetof all secondarydependency namesassociatedwith agivenprimarykey (PK). In thesecondstep,it invokesthecache’s Lookup function. Inthe event that thereareno entriesassociatedwith the primary key passedto theSecondaryNamesfunction, the Lookup call is skipped. In the event of a cachemiss, the evaluatorinvokes the cache’s AddEntry function to createa new entryandaddit to thecache.

We measuredthe elapsedtime spentin the function cacheserver processtohandlevariousRPCrequestsfrom theclient. Thesemeasurementsweremadebyperformingseveral identicalexperiments.Eachexperimentconsistedof perform-ing a scratchbuild of the Vestaevaluator(startingfrom a nearlyempty functioncachein which only the standardenvironmenthadbeenbuilt), followed by fiveincrementalbuilds of theevaluatortriggeredby touchingasingleevaluatorsourcefile. Themeantimesfor eachoperationwerecalculatedfor eachexperiment;themeansandstandarddeviationsof thosemeantimesarereportedin Table9.7.

Number Mean Std.Dev.Operation of Calls Time(ms) (% of Mean)SecondaryNames 518 16.8 0.63%Lookup 112 11.7 0.44%AddEntry 438 8.1 0.24%

Table9.7: Elapsedtimesof variousfunctioncacheserveroperations.

A moredetailedanalysisrevealsthat missesaccountfor 34% of the Lookupcalls and take 17.8 ms on average. In this experiment,90% of the hits were tocacheentriesin memory, takingonly 6.3mseach,while theremaininghitsto cacheentrieson disk took 30.3ms on average.The limiting factorin cacheoperations

155

Page 168: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

seemsto be disk latency. We suspectthe function cacheandrepositorymay becompetingfor CPUanddisk. Suchcontentioncouldbeeliminatedby runningtheprocesseson separateserver machinesandby storing their files on separatefilesystems.

The performanceresultsabove are for a relatively small build. As anotherexperiment,we performeda scratchbuild of theentireVestareleasefollowed byincrementalbuilds of newer andolder versionsof the entireVestarelease.Thenumberof cacheoperationsin this secondexperimentwassignificantlylarger, aswasthenumberof new entriesaddedto thecache.Table9.8shows theresults.6

Number MeanOperation of Calls Time (ms)SecondaryNames 4,351 29.0Lookup 763 44.0AddEntry 3,948 21.6

Table9.8: Elapsedtimesof variousfunctioncacheserveroperationsfora scratchbuild of thecompleteVestareleasefollowedby two incremen-tal builds.

ComparingTables9.7and9.8,weseethatcacheperformancedoesdegradeasthe numberof cacheentriesis increased.Although we do not have enoughdatato meaningfullyextrapolatehow the function cachewill performfor even largerbuilds, the resultspresentedin Section9.2.2indicatethat the functioncacheper-formswell for themedium-sizescratchandincrementalbuildswehaveperformed.

9.4.2 StableCacheAttrib utes

Recall from Section6.5 that cacheentriesarepartitionedfirst into PKFiles,andtheninto CFPgroups;only the entriesin a singleCFPgroupneedbe consultedon eachLookupoperation.To determinetheeffectivenessof thisorganization,wemeasuredvariousattributesof thefunctioncache’s stablecacheentryfiles.

The meanvaluesof variousfunctioncacheattributesareshown in Table9.9.Thesestatisticsweremeasuredfrom astablecachecontaining13,900cacheentriesdistributedover10,183PKFiles.Thetotaldiskspacerequiredto storetheseentrieswas112MB, or 8.2KB percacheentry.

Thestatisticsindicatethattheseparationof entriesinto CFPgroupsis workingwell: the averagenumberof entriesper CFPgroupis only slightly morethan1.Moreover, of the average52.4 secondarynamesper PKFile, more than99% are

6This table doesnot include standarddeviations becausethe experimentwas performedonlyonce.

156

Page 169: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Attribute MeanValueNumberof CFPgroupsperPKFile 1.33Numberof entriesperCFPgroup 1.02Cacheentrysize(Kbytes) 8.24Functionresultvaluesize(Kbytes) 5.82Numberof secondarynamesperPKFile 52.4Percentageof commonnamesperPKFile 99.1%Percentageof uncommonnamespercacheentry 2.5%

Table9.9: Meanvaluesof variousfunctioncacheattributes.

common.Thus,only averysmallnumberof fingerprintsneedbecomparedduringa typical lookupoperation.

9.4.3 ResourceUsage

This subsectionreportson the functioncache’s memoryanddisk usage;its CPUusagewasreportedearlierin Section9.2.4.

Thecacheserver’smemoryrequirementsaredominatedby thestorageusedforin-memorycacheentries.Much like a virtual memorysystem,thefunctioncacheflushesunusedcacheentriesto disk over time. Its policy canbeadjustedto flushentriesmoreaggressively if toomany entriesarebeingretainedin memory.

As shown in Table9.9,theaveragefunctionresultvaluealoneis nearly6KB insize. The function resultvalueis thedominantfactorin thecache’s memoryuse.Hence,onamachinewith 100MBof availablemainmemory, thecachecouldkeepabout18,000entriesin memorysimultaneously. Consideringthat a build of theentireVestasystem(including the completesetof Modula-3 libraries)generatesonly about3,500cacheentries,thefunctioncache’smemoryusageshouldscaleupwell to largerprojects.

The function cacheusesthe samegarbagecollector as the evaluator, so asdescribedin Section9.2.4, its actualusagemay be somewhat more thanstrictlyneeded.Thecommentsin thatsectionabouttuningthecollector’s heuristicsapplyequallywell to thefunctioncache.

Thefunctioncache’sdiskrequirementspercacheentryarelargerthanits mem-ory requirements.The extra costsaredueto metadata—suchas the PKFile andCFPgroupdatastructures—andto extra informationstoredwith stablecacheen-tries thatallows themto beefficiently shuffled into new CFPgroupswhena PK-File’s setof commonnameschanges.On average,a cacheentryrequiresroughly8.25KB of disk space.To accommodatethedesigntargetof 12 million cacheen-tries,thefunctioncachewould thusrequireroughly95GBof disk storage.

157

Page 170: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

9.4.4 Scalability

In this subsection,we describepotential scalability bottlenecksin the functioncache,andthestepswe have takento avoid them.

s Lock Contention. As thenumberof clientsincreases,overall functioncacheperformancemightdegradedueto lock contentiononthecache’s in-memorydatastructures.Toavoid thisproblem,thefunctioncacheusesrelatively fine-grainedlocking: thereis a separatelock on eachin-memoryPKFile anditscacheentries.Moreover, thelock on thecache’s centralstructuresis heldasbriefly aspossiblein theplaceswhereit is required.

s Bad CacheEntry Distribution. Thefunctioncache’sperformancewill suf-fer if thenumberof cacheentriesperCFPgroupgrows too large. However,dueto thefunctioncache’s schemeof dividing PKFilesinto CFPgroups(asdescribedin Section6.5),wedonotexpectthisproblemto beserious.Whentheproblemhasarisenin thepast,it wasusuallydueto a flaw in theeval-uator’s cachingstrategy, or it could be correctedby changingbridgemod-els so the evaluatorcomputedbetterprimary keys for their functions. TheVCacheStatsprogramthatwasusedto gatherthestatisticsfor Table9.9canalsobeusedto uncover skewedcacheentrydistributions,sosuchproblemsareeasilydetected.

s Disk Latency. As the numberof cacheentriesper PKFile increases,thefiles grow largerandmoretime maybespentwaiting on disk readsduringlookups.Two factorsmitigatethis effect. First,we expectthatanincreasednumberof entriesperPKFile will accountfor at mostoneof thetwo ordersof magnitudein cacheentry growth cited above. (The otherorderof mag-nitudewill appearin theform of anincreasednumberof PKFiles.)Second,the PKFile disk format hasbeendesignedto minimize the numberof diskreadsrequiredper lookup. In particular, the index of CFPgroupsis storedseparatelyfrom the cacheentriesin the hopethat the entire index canbereadin a singledisk operation.Similarly, extra cacheentry informationnotrequiredfor lookup is storedseparatelyfrom the entries,againto decreasethenumberof readandseekoperationsin a typical lookup.

s Memory Usage.Thereis anobvioustime-spacetradeoff betweenthenum-berof cacheentrieskept in memoryandthespeedof a typical lookup. Asthe numberof cacheentriesandclientsincreases,a smallerfraction of the“workingset”entriescanbekeptin memory. Onesolutionto thisproblemissimply to installmorememoryon theserver machine.But evenif noentries

158

Page 171: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

werekept in memory, the detailedresultsdescribedin Section9.4.1aboveimply thattheoverall functioncacheperformancewouldnotdegradebadly.

9.5 WeederPerformance

In evaluatingtheperformanceof the weeder, two measuresareof interest.First,we considerhow often weedsare necessary. Second,we considerhow long ittakes the weederto run. Becauseweedingis a backgroundprocess,both issuesarenearlyinvisible to users.However, someadministrative effort is neededto runthe weeder(or to set it up to run automatically),and builds are likely to run abit moreslowly while theweederis active andcompetingfor resources,so theseperformanceissuesareworthconsidering.

How oftenareweedsnecessary?Theanswerdependson how quickly thediskusedby the repositoryandfunction cachefills up, which in turn is a function ofthe disk’s size,the rateat which new cacheentriesandderivedsarecreated,andthesizesof thosecacheentriesandderivedfilesondisk. Ourexperience,in whichthreedeveloperswereactively doing builds againsta 4GB disk, is that weedingwasrequiredaboutonceevery two weeks.

The weedingexperienceof the larger Compaqengineeringgroup discussedin the next chapterprovidesanotherdatapoint. They wereusinga 100 GB diskcluster, andtheir builds producedroughly 10 GB of derived files per day. Theywrote a cron job that ran every night, and that automaticallyperformeda weedwhenever thediskusagepassedapredeterminedthreshold.Thissamejob alsoranautomaticallyduring thedaywith a higherthresholdto make surethata spike inthe rateof disk consumptionwould not fill the disk beforethe next nightly run.Weedsgenerallyoccurredonceor twice aweek.

How longdoesit taketheweederto run?Recallfrom Chapter8 thattheweederrunsin two phases:a mark phasein which the cacheentriesandderived files tokeepare discovered, followed by a deletionphasein which the function cacheandrepositorydo cacheentrydeletionandderivedfile deletionin parallel. In ourexperience,the deletionphasetakessignificantlylongerthanthemark phase.Inparticular, thedeletionphasetakeson theorderof 10–15minuteswhenweedingagainsta4GBdisk. A weedby theCompaqengineeringgroupagainsttheir100GBdisk generallytook aboutanhour. We expectthedeletiontime will scalelinearlywith thenumberof cacheentriesbeingdeleted,which in turn is boundedlinearlyby thesizeof thebackingdisk.

The performanceof the weeder’s mark phaseis a function of the numberofgraphlog entriesbufferedin memory:thesmallerthebuffer, themorepassesoverthe graphlog are required. As onepoint along the time-spacecurve, we found

159

Page 172: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

that weedinga graphlog containingover 30,000entriesusinga buffer of 10,000entriesrequiredfour scansof thedisk file and10 secondsof elapsedtime. As thesizeof thegraphlog increases,morescansof thedisk file will berequired,but weexpectthattheentiremarkphasewill requirenomorethantenminutesonacachecontainingonemillion cacheentries.

9.6 Inter processCommunication

For interprocesscommunication,VestausesSRPC,a simpleRPC(remoteproce-durecall) protocolandlibrary for C++ thatwe implementedon top of TCPsock-ets. SRPCdoesnot includeanautomaticstubgenerator, soall of our marshalingand unmarshalingstubswere written by hand. To simplify stub coding, SRPCprovidesmethodsfor sendingandreceiving commondatatypeslike integers,null-terminatedstrings,arraysof bytes,andsequences.TheSRPCimplementationusesTCPkeepalivesto detectnetwork partitionsandotherconnectionfailures.

Exceptwherenotedin themorerecentcross-repositoryexperiments,all of thetestsdescribedabove were run on our local AN2 network. AN2 is a switchedAsynchronousTransferMode (ATM) network that uses155 megabit per second(Mbit/sec) links. On this network, our RPC packageperformsas follows. Theround-tripelapsedtime for a null RPC(i.e., no argumentsor results)is 1.2 mil-lisecondson average.In simpletestsperformedover varyingargumentandresultsizes,clientssendargumentsat anaveragerateof 75 to 100Mbit/sec,andreceiveresultsat anaveragerateof 65 to 132Mbit/sec.

160

Page 173: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Chapter 10

Conclusions

Thegoalof theVestaprojecthasbeento provideasolidbasefor softwareconfigu-rationmanagementby dealingwell with thecoreproblemsof sourcemanagementandbuilding, in a way that scalesup to very large softwareprojects(tensof mil-lionsof linesof code),yet alsoworkswell for smallerones.

To theextentthatwe areableto evaluateit, Vestahasmetits goal.Thesourcemanagementsystempreserves sourcecodeimmutably and immortally, supportsbothsimplelinearversioningandarbitrarily complex branchingfor paralleldevel-opment,makes all versionsdirectly accessiblethroughthe file system,andpro-videsvery fastcheckout andcheckinusingcopy-on-write. It supportsdistributeddevelopmentwith sourcereplication,cross-repositorycheckout andcheckin,andcross-realmaccesscontrol. The build systemprovides repeatable,incremental,andconsistentbuilds. Despiteproviding muchstrongerguaranteeson build con-sistency, it runsasfastasMakefor scratchbuilds,andfasterfor incrementalbuilds.Thebuilderalsosupportsparallelbuildsonmultiplemachines.Thesystemmodel-ing languageis modular, flexible, andgeneral.It allows thedescriptionof a large,complex softwaresystemto bebrokendown into small,relatively simplemodules.It allows new build toolsto beintegratedby codingwithin themodelinglanguage,with no needto modify thebasesystem.

For over two and a half years,Vestawas in daily useby Compaq’s Aranagroup,a teamof about130developersworking on a largemicroprocessordesign.Theteamwassplit into two subgroups,onein New Englandandtheotherin Cal-ifornia, eachwith its own repository. Both the chip designitself andthe team’scustomdesignsoftware were storedin the repositoriesand processedusing thebuild system.Thefinal codebaseconsistedof about700,000lines. We foundthisexperiencehighly valuablein validatingtheVestadesign,shakingout bugsin theimplementation,and exposingthe needfor varioussmall featuresthat were not

161

Page 174: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

initially anticipated.The Arana group in turn was pleasedwith Vesta. They have statedthat its

strongsupportfor parallelsourcedevelopmentandreproduciblebuildssavedthemconsiderabletime (3 to 6 monthsin thearchitecturaldesignphasealone),andthatthedistributeddevelopmentfeaturesprovidedanswersto someextremelydifficultproblemsthey facedin bicoastalsoftwareanddesigndatabasemanagement.TheyalsofoundVestato be extremelyusefulfor trackingdown difficult bugs,andex-pectedit to becomeevenmorevaluableastheprojectdrew closerto completion.

Somequestionsdo remain,including scalability, generality, easeof use,andconversionandlearningcosts.

The Vestaimplementationshows excellentperformancewhenmanagingandbuilding the software projectswe have testedit on, but our goal was to supportprojectsat leastanorderof magnitudelarger. Wehavearguedatappropriatepointsthroughoutthis reportthatouralgorithmsandimplementationstructurecanbeex-pectedto scaleup well, but our casecannotreally beconvincing until thesystemis put into useon thescaleit wasintendedfor. In thefuturewehopeto furthertestVesta’s scalabilityby finding additionalusersin the freesoftwarearena.PortstoAlphaLinux and32-bit Intel Linux have beencompleted,andCompaqhasagreedto releasethesystem’s sourcecodeto thepublic undertheLGPL [19]. Thecom-pletesourceswill soonbemadeavailablethroughtheVestawebsite[51].

TheVestabuilder alwaysbuilds derivedobjectsfrom source,but doesits owncachingof build stepsto save work. This paradigmlimits thebuilder’s generalitysomewhat,becausesomesoftwaretool setsdonotwork bestwhenbuilding directlyfrom sourceandgiving Vestacontrol over eachindividual step. For example,anincrementallinker increasesthespeedof linking by modifyingapreviously linkedprogram,replacingonly thepartsthathavechanged,ratherthanrelinkingtheentireprogram. Somelanguagecompilers(suchasthe Modula-3compiler)work mostefficiently whengiven all the sourcefiles thatgo into a programat once;this al-lows themto parseinterfacefiles only onceandcachetheresultsfor reuse,ratherthanreparsingthemfor eachimplementationfile thatusesthem.Somebuild toolsmayaskfor humaninterventionduringthebuild process.Generally, Vestacanbeusedwith toolsof thesekindsby bypassingtheincrementalfeaturesandscriptingany humaninteractionthat would otherwisebe required,but therecanbe a per-formancepenaltyin doing so. For example,the Arana grouphadone tool thatoperatedin an incrementalmode,andin fact it wasin theprocessof beingmodi-fied to have evenmoreincrementalfeaturesat thetime they beganto adoptVesta.Although disablingthesefeaturesmadethe tool slower, thegroupdecidedthat itwasworthwhilepayingthisperformancecostin orderto getVesta’sotherbenefits.Ultimately they wereableto modify thetool to work betterwith Vestaandgetbackagooddealof thelostperformance.

162

Page 175: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

WehavefoundVestafairly easyto usefor ourown softwaredevelopmentactiv-ities,asdid theAranagroup.OnceVestawasmatureenoughto beusedto developitself, wehappilyswitchedto it andnever wantedto goback.OurexperiencewithVesta-1alsotaughtusmuch;Vesta-2preservesthestrengthsof Vesta-1while cor-rectingits major deficiencies,both in easeof useandperformance.Somepiecesthatcouldmake Vesta-2easierto usearemissingat this writing, however, includ-ing agraphicaluserinterface,toolsto synthesizesimplemodelsautomatically, andtutorialuserdocumentation.TheAranagroupdevelopedtheirown specializedver-sionsof eachof theseitemsfor their own environment,however, andin thefuturewemaybeableto generalizesomeof themfor useby others.

BecauseVestarepresentsa comprehensive new approach,softwaredevelop-mentgroupswill incur somecostsin switchingto it. Groupswith large softwarebasesalreadyunderdevelopmentusing existing configurationmanagementsys-temswill needconversiontools. (In workingwith theAranagroup,wewerefortu-nateto comein at thebeginningof theproject,whenvery little of thechip designcodebasehadyet beenwritten.) We have severalunpublisheddesignsfor toolstoeasetheconversionfrom Makefilesto models,anddesigningtoolsto convert fromRCS,CVS,andtheliketo therepositorywouldbefairly straightforward.However,we have not implementedsuchtools to date.Therewill alsobea trainingcostinlearningto useVesta.Wehave tried to designtherepositorytoolsto besimpleandintuitive to use,but they dodiffer somewhatfrom existingsystems.A moreseriousconcernis the modelinglanguage.We have greatlysimplified andimproved thelanguagesinceVesta-1,andwehave madesurethatthemodelsusersnormallyseeandwrite arelittle morethanlistsof sourcefileswith afew linesof boilerplate.Yetwe anticipatethatthelanguagemaystill beanobstacleto acceptance;naive userswill find it unfamiliar, andsophisticateduserswill have a greatdealof learningtodo if they needto modify the complex modelsin the standardconstructionenvi-ronment.(SeveralAranausersdid do thissuccessfully;a few becamequiteexpertatwriting models.)

In summary, theresearchphaseof theVestaprojectis complete,thesystemhasseenserioususewithin Compaqfor morethantwo years,andwe arenow on theverge of releasingit asfreesoftwarefor externaluse. We believe thesystemhasmany strengthsandwill serve its futureuserswell.

163

Page 176: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Appendix A

The VestaSystemDescriptionLanguage

A.1 Intr oduction

Thisappendixdescribestheformalsyntaxandsemanticsof theVesta-2SystemDe-scriptionLanguage(SDL). Weexpectit will beusedasareferenceby Vestausers.Becausethis descriptionis meantto becompleteandunambiguous,its treatmentis ratherformal. A lessformal languagereferenceis available[45].

In Vesta,theinstructionsfor building a softwareartifactarewritten asanSDLprogram.Evaluatingtheprogramcausesthesoftwaresystemto beconstructed;theprogram’s resultvaluetypically containsthederivedfiles producedby theevalua-tion.

TheVestaSDL is a functionallanguagewith lexical scoping. Its valuespaceincludesBooleans,integers,texts, lists (similar to LISP lists), sequencesof name-valuepairscalledbindings, closures,andauniqueerrorvalue.

The languageis dynamically typed; that is, types are associatedwith run-time valuesinsteadof with static namesand expressions. Even without statictypechecking,thelanguageis stronglytyped:anexecutingVestaprogramcannotbreachthelanguage’s typesystem.Theexpectedtypesof parametersto languageprimitivesaredefined,andthosetypesarechecked whenthe primitivesareeval-uated. The languageincludesprovisionsfor specifyingthe typesof user-definedfunction argumentsand local variables,but thesetype declarationsarecurrentlyunchecked.

Thelanguagecontainsroughly60 primitive functions.Thereis a run toolprimitive for invoking external tools like compilersandlinkersasfunction calls.Externaltoolscanbeinvokedfrom Vestawithout modification.

164

Page 177: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Conceptually, every software artifact built with Vestais built from scratch,therebyguaranteeingthat the resultingartifact is composedof consistentpieces.Vestausesextensive cachingto avoid unnecessaryrebuilding. Vestarecordssoft-waredependenciesautomatically. The techniquesby which the implementationcachesfunctioncallsanddeterminesdependenciesaredescribedin Chapters6 and7 of this report.

A.2 Lexical Conventions

This sectiondefinesthemeta-notationandterminalsusedin subsequentsections.SectionA.3 introduceseachlanguageconstructby giving its syntaxandsemantics.Thesyntaxof thecompletelanguageis givenin SectionA.4.

A.2.1 Meta-notation

Nonterminalsof thegrammarbegin with anuppercaseletter, areat leasttwo char-actersin length, and include at leastone lowercaseletter. Except for the fourterminalslisted in SectionA.2.2 below, eachof which denotesa classof tokens,theterminalsof thegrammararecharacterstringsnotof this form.

Thegrammaris written in a variantof BNF (Backus-NaurForm). Themeta-charactersof thisnotationare:

::= | [ ] { } * + ‘ ’

Themeaningof themetacharactersis asfollows:

NT ::= Ex Non-terminalNT rewritesto expressionExEx1 | Ex2 Ex1or Ex2[ Ex ] optionalExt

Ex u meta-parenthesesfor groupingEx* zeroor moreEx’sEx* , zeroor moreEx’s separatedby commas,trailing commaoptionalEx* ; zeroor moreEx’s separatedby semicolons,trailing semioptionalEx+ oneor moreEx’sEx+, oneor moreEx’s separatedby commas,trailing commaoptionalEx+; oneor moreEx’s separatedby semicolons,trailing semioptional‘s’ theliteral characteror charactersequences

Whenusedasterminals,squarebrackets,curly brackets,andverticalbarappearinsinglequotesto avoid ambiguitywith thecorrespondingmetacharacters(i.e., ‘ [ ’,‘ ] ’, ‘

t’, ‘ u ’, ‘ | ’).

165

Page 178: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

A.2.2 Terminals

Thefollowing namesareusedasterminalsin thegrammar. They denoteclassesoftokens,andaredefinedpreciselyin SectionA.4.3.

Delim A pathnamedelimiter. Either forward or backward slashesareallowedwithin pathnames,but notboth.

Integer An integer, expressedin eitherdecimal,octal,or hexadecimal.

Id An identifier. An identifier is any sequenceof letters,digits, periods,andun-derscoresthatdoesnot representaninteger. For example,foo and36.fooareidentifiers,but 36and0x36arenot.

Text A text string.Textsareenclosedin double-quotes.They maycontainescapesequencesandspaces.

Commentsandwhitespacefollow C++conventions.A commenteitherbeginswith // andendswith thefirst subsequentnewline, or beginswith /* andendswith*/ (the latter form doesnot nest).Of course,thesedelimitersareonly recognizedoutsidetext literals. White spacedelimits tokensbut is otherwiseignored(exceptthat theSpacecharacter, theASCII characterrepresentedby thedecimalnumber32, is significantwithin text literals). The grammarprohibitswhite spaceotherthantheSpacecharacterwithin text literals.

Thenamesof thebuilt-in functionsbegin with anunderscorecharacter, andtheidentifier consistingof a singleperiod(i.e., “.”) playsa specialrole in the VestaSDL. It is thereforerecommendedthatVestaprogramsavoid definingidentifiersoftheseforms.

A.3 Semantics

The semanticsof programswritten in the VestaSDL aredescribedby a functionEval that mapsa syntacticexpressionanda context to a value. That is, Eval(E,C) returnsthevalueof thesyntacticexpressionE in thecontext C. In additiontosyntacticexpressions(denotedby thenon-terminalExpr in thegrammar),thedo-main of Eval includesadditionalsyntacticconstructs.Someof theseadditionalconstructsaredefinedby theconcretegrammar, while othersareintroducedas“in-termediateresults”duringtheevaluationprocess.Thelatterarenotedwheretheyareintroduced.Eachvaluereturnedby Eval is in theVestavaluespace,describedin thenext section.Thecontext parameterC to Eval is avalueof typet bindingintheVestavaluespace.

166

Page 179: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Typename Valuesof thetypet bool true, falset int integerst text arbitrarybytesequencest list sequencesof zeroor morearbitraryvaluest binding sequencesof zeroor more(name, value) pairst closure closures,eachof which is a triple v ew f w bxt err errt value unionof all of theabove types

TableA.1: Thetypesandvaluesof theVestalanguage.

A.3.1 ValueSpace

Valuesaretyped.Thetypesandvaluesof thelanguageareshown in TableA.1.The valuestrue, false, emptylist(the list of length zero), emptybinding(the

binding of lengthzero),anderr arenot to be confusedwith the languageliteralsTRUE, FALSE, <>, [] , andERRthatdenotethosevalues.

Thetypet bool containstheBooleanvaluestrueandfalse, denotedin thelan-guageby theliteralsTRUEandFALSE.

Thetypet int containsintegersoveratleasttherangey 231 to 231 y 1; theexactrangeis implementationdependent.

Thetypet text containsarbitrarysequencesof 8-bit bytes.This typeis usedtorepresenttext literals (quotedstrings)in SDL programsaswell asthecontentsoffiles introducedthroughthe Files nonterminalof the grammar. Consequently, animplementationmustreasonablysupporttherepresentationof largevaluesof thistype(millions of bytes),but is not requiredto supportefficient operationson largetext values.

Thetypet list containssequencesof values.Theelementsof a list neednotbeof thesametype.

Thetypet bindingcontainssequencesof pairs z ti wH{ i | , in whicheachti isanon-emptyvalueof typet text, each{ i is anarbitraryVestavalue(i.e.,of typet value),andthe ti areall distinct. Notethatbindingsaresequences:they areordered.Thedomainof a binding is thesequenceof namesti at its top level. Bindingsmaybenested.

Bindingsplay an importantrole in theVestalanguage.They areusedto rep-resenta varietyof interestingobjects.For example,flat bindingsthatmapnamesto texts can be usedto representcommand-lineswitchesand environmentvari-ables;bindingsthatcontainnestedbindingscanbeusedto representfile systems;

167

Page 180: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

andbindingsthatmapnamesto closurescanbeusedto representinterfaces.Sec-tion A.3.4.5describestheprimitive functionsandoperatorsfor manipulatingbind-ings,includingthreeprimitivesfor combiningtwo bindings.

Thetypet closurecontainsclosurevaluesfor theprimitiveoperatorsandfunc-tions(definedin SectionA.3.4) aswell asfor user-definedfunctions.In a closurev ew f w bx :

s e is a functionbody(i.e.,aBlock asperthegrammarbelow);

s f is a list of pairs z ti w ei | , whereti is a t text value(a formalparametername)andei is eitherthedistinguishedexpressionv emptyExprx or anExpr (a de-fault parametervalue);and

s b is avalueof typet binding(theclosurecontext).

The type t err consistsof the single distinguishedvalue err, denotedin thelanguageby theliteral ERR. Programmerscanusethisvalueasthey choose;it hasno predefinedsemantics.

A.3.2 Type Declarations

Thelanguageincludesa rudimentarymechanismfor declaringtheexpectedtypesof valuescomputedduringevaluation.Thegrammardefinesa smallsub-languageof type expressions,which includesthe ability to give namesto typesandto de-scribeaggregatetypes(lists, bindings,functions)with varying degreesof detail.Typeexpressionsmay be attachedto function argumentsandresultsandto localvariables,indicatingthetypeof theexpectedvaluefor theseidentifiersandexpres-sionsduringevaluation.

TheVestaevaluatorcurrentlytreatstypenamesandtypeexpressionsassyntac-tically checkedcomments;it performsno otherchecking.Futureimplementationsmay type-checkexpressionsat run time andreportan error if the valuedoesnotmatchthespecifiedtype (accordingto someasyet unspecifieddefinitionof whatit meansfor avalueto “match” a typespecification).

The syntaxfragmentsandsemanticdescriptionsin subsequentsectionsomitany furtherreferenceto typeexpressionsentirely.

A.3.3 Evaluation Rules

Theevaluationof aVestaprogramcorrespondsto theabstractevaluation:

Eval( M([]) , C-initial)

168

Page 181: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

class val {public:

operator int();// converts Vesta t_int or t_bool to C++ int

val(int);// converts a C++ integer to a Vesta t_int

int operator== (val);// compares two Vesta values, returning true (1)// if they have the same type and are equal, and// false (0) otherwise

}

TableA.2: A C++ classdeclarationfor Vestavalues.

static val true; // value of literal TRUEstatic val false; // value of literal FALSEstatic val emptylist; // value of literal < >static val emptybinding; // value of literal [ ]static val err; // value of literal ERR

TableA.3: Definitionsof constantsusedby thepseudo-code.

whereM is theclosurecorrespondingto thecontentsof an immutablefile (a sys-tem model) in the VestarepositoryandC-initial is an initial context. M’s modelshouldhave thesyntacticform definedby thenonterminalModeldescribedin Sec-tion A.3.3.13below. C-initial definesthenamesandassociatedvaluesof thebuilt-in primitive operatorsandfunctionsdescribedin SectionA.3.4below.

Thedefinition of Eval by casesfollows. UnlessE is handledby oneof thesecases,Eval(E,C) yieldsa runtimeerror. As mentionedabove, thedomainof Evalincludesthelanguagegeneratedby theconcretegrammarasapropersubset.Thus,in someof thecasesbelow, theexpressionE canariseonly asanintermediateresultof anothercaseof Eval. Thesecasesareexplicitly noted.

Thepseudo-codethatdefinesthevariouscasesof Eval andtheprimitive func-tionsshouldbereadlike C++. Thatcodeassumesthedeclarationfor therepresen-tationof Vestavaluesshown in TableA.2. Notethattheoperator== is theoneinvokedby usesof “==” in theC++ pseudo-code.It is not to beconfusedwith theprimitive equalityoperatordefinedon variousVestatypesin SectionA.3.4. Thepseudo-codealsorefersto theconstantsshown in TableA.3.

For convenience,thepseudo-codeadoptsthefollowing notationalconventions:

169

Page 182: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

s Eval is definedby casesratherthanby oneC++ functionwith anenormousembeddedcaseselection.

s Recursive referencesto Eval appearinline in thesameform that is usedtoidentify theindividual cases.

s Primitive functionsof the Vestalanguage,whosenamesbegin with an un-derscore,areinvoked inline from thepseudo-codeasif they wereordinaryC++ functions. The primitive operatorsof the Vestalanguageareinvokedin this way too; for example,whenthe pseudo-coderefersto operator+,itmeanstheVestaprimitive function,not theC++ operator. Notethatsomeofthe Vestaoperatorsareoverloadedby type, but not by arity. For example,operator+is definedon integers,texts, lists,andbindings,but it alwaystakestwo arguments.

s In the pseudo-codefor rules that containthe terminal Id, the variable iddenotesthevalueof theId representedasa t text.

s If the pseudo-statementerror is reached,evaluationhaltswith a runtimeerrorandappropriateerrormessage.No valueis produced.

In eachof the following sections,we first presentthe relevantportionsof thelanguagesyntax.We thenpresenttheevaluationrulesthatapplyto thosesyntacticconstructs.Thecompletelanguagesyntaxis givenin SectionA.4.

A.3.3.1 Expr

Syntax:

Expr ::= if Expr then Expr else Expr | Expr1Expr1 ::= Expr2 { => Expr2 }*Expr2 ::= Expr3 { || Expr3 }*Expr3 ::= Expr4 { && Expr4 }*Expr4 ::= Expr5 [ { == | != | < | > | <= | >= } Expr5 ]Expr5 ::= Expr6 { AddOp Expr6 }*AddOp ::= + | ++ | -Expr6 ::= Expr7 { MulOp Expr7 }*MulOp ::= *Expr7 ::= [ UnaryOp ] Expr8UnaryOp ::= - | !Expr8 ::= Primary [ TypeQual ]Primary ::= ( Expr ) | Literal | Id | List

| Binding | Select | Block | FuncCall

170

Page 183: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

The grammarlists the operatorsin increasingorder of precedence.The binaryoperatorsateachprecedencelevel areleft-associative.

Evaluation Rules:

The evaluationrules for conditional,implication, disjunction,conjunction,com-parison,AddOp, MulOp, UnaryOp,andparenthesizedexpressionsareshown inTablesA.4 andA.5. Thereareseven remainingpossibilitiesfor a Primary: Lit-eral,Id, List, Binding,Select,Block, andFuncCall.Thesearetreatedseparatelyinsubsequentsections.

A.3.3.2 Literal

Syntax:

Literal ::= ERR | TRUE | FALSE | Text | Integer

Evaluation Rules:

Eval( ERR , C) = errEval( TRUE , C) = trueEval( FALSE , C) = falseEval( Text , C) = the corresponding t_text valueEval( Integer, C) = the corresponding t_int value

In theText evaluationrule, theC++ interpretationof escapecharactersis used.In theIntegerevaluationrule,evaluationhaltswith a runtimeerrorif theintegeristoo largeor smallto berepresentedby theimplementation.

A.3.3.3 Id

Evaluation Rules:

Eval( Id , C) = _lookup(C, id)

As definedin SectionA.3.4.5, lookup (b, nm) is the valueassociatedwith thenon-emptynamenm in thebindingb. Theevaluationhaltswith a runtimeerror ifnm is emptyor is not in b’s domain.

A.3.3.4 List

Syntax:

List ::= < Expr*, >

171

Page 184: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

// conditional expressionEval( if Expr_1 then Expr_2 else Expr_3 , C) ={

val b = Eval( Expr_1 , C);if (_is_bool(b) == false) error;if (b == true) return Eval( Expr_2 , C);else return Eval( Expr_3 , C);

}

// conditional implicationEval( Expr_1 => Expr_2 , C) ={

val b = Eval( Expr_1 , C);if (_is_bool(b) == false) error;if (b == false) return true;b = Eval( Expr_2 , C);if (_is_bool(b) == false) error;return b;

}

// conditional OR (disjunction)Eval( Expr_1 || Expr_2 , C) ={

val b = Eval( Expr_1 , C);if (_is_bool(b) == false) error;if (b == true) return true;b = Eval( Expr_2 , C);if (_is_bool(b) == false) error;return b;

}

// conditional AND (conjunction)Eval( Expr_1 && Expr_2 , C) ={

val b = Eval( Expr_1 , C);if (_is_bool(b) == false) error;if (b == false) return false;b = Eval( Expr_2 , C);if (_is_bool(b) == false) error;return b;

}

Table A.4: Evaluation rules for conditional, implication, disjunc-tion, and conjunction expressions. As defined in Section A.3.4.7,is bool (b) is true if b is avalueof typet boolandfalseotherwise.

172

Page 185: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

// comparisonEval( Expr_1 == Expr_2 , C) =

operator==(Eval( Expr_1 , C), Eval( Expr_2 , C))Eval( Expr_1 != Expr_2 , C) =

operator!=(Eval( Expr_1 , C), Eval( Expr_2 , C))Eval( Expr_1 < Expr_2 , C) =

operator< (Eval( Expr_1 , C), Eval( Expr_2 , C))Eval( Expr_1 > Expr_2 , C) =

operator> (Eval( Expr_1 , C), Eval( Expr_2 , C))Eval( Expr_1 <= Expr_2 , C) =

operator<=(Eval( Expr_1 , C), Eval( Expr_2 , C))Eval( Expr_1 >= Expr_2 , C) =

operator>=(Eval( Expr_1 , C), Eval( Expr_2 , C))

// AddOpEval( Expr_1 + Expr_2 , C) =

operator+ (Eval( Expr_1 , C), Eval( Expr_2 , C))Eval( Expr_1 ++ Expr_2 , C) =

operator++(Eval( Expr_1 , C), Eval( Expr_2 , C))

// MulOpEval( Expr_1 - Expr_2 , C) =

operator- (Eval( Expr_1 , C), Eval( Expr_2 , C))Eval( Expr_1 * Expr_2 , C) =

operator* (Eval( Expr_1 , C), Eval( Expr_2 , C))

// UnaryOpEval( ! Expr , C) = operator!(Eval( Expr , C))Eval( - Expr , C) = operator-(Eval( Expr , C))

// parenthesizationEval( ( Expr ) , C) = Eval( Expr , C)

TableA.5: Evaluationrulesfor comparison,AddOp,MulOp,UnaryOp,andparenthesizedexpressions.

173

Page 186: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

The useof <, > asboth binary operatorsandlist delimitersmakes the grammarambiguous.SectionA.4.2explainshow theambiguityis resolved.

SyntacticDesugarings:

<Expr1 wH}�}�}�w Exprn> desugarsto <Expr1> + <Expr2 wH}�}�}Dw Exprn>

Here,‘+’ is theconcatenationoperatoron lists.

Evaluation Rules:

Eval( <> , C) = emptylistEval( < Expr > , C) = _list1(Eval( Expr , C))

As definedin SectionA.3.4.4, list1 (val) evaluatesto alist containingthesinglevalueval.

A.3.3.5 Binding

Syntax:

Binding ::= ‘[’ BindElem*, ‘]’BindElem ::= SelfNameB | NameBindSelfNameB ::= IdNameBind ::= GenPath = ExprGenPath ::= GenArc { Delim GenArc }* [ Delim ]GenArc ::= Arc | $ Id | $ ( Expr ) | % Expr %Arc ::= Id | Integer | Text

SyntacticDesugarings:

Thefollowing desugaringsapplyto BindElem’s within aBinding.

Id desugarsto Id = IdGenArcDelim = Expr desugarsto GenArc= ExprGenArcDelim GenPath= Expr desugarsto GenArc= [ GenPath= Expr ]$ Id = Expr desugarsto $ ( Id ) = Expr%Expr1 %= Expr2 desugarsto $ ( Expr1 ) = Expr2

TheSelfNameBsyntacticsugarallows namesfrom thecurrentscopeto becopiedinto bindingsmoresuccinctly. For example,thebindingvalue:

[ progs = progs, tests = tests, lib = lib ]

174

Page 187: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

caninsteadbewritten:

[ progs, tests, lib ]

TheGenPathsyntacticsugarallows bindingsconsistingof a singlepathto bewrittenmoresuccinctly. For example,thebindingvalue:

[ env_ovs = [ Cxx = [ switches = [ compile =[ debug = "-g3", optimize = "-O" ]]]]]

caninsteadbewritten:

[ env_ovs/Cxx/switches/compile =[ debug = "-g3", optimize = "-O" ]]

Evaluation Rules:

First, therulesfor constructingemptyandsingletonbindings:

Eval( [ ] , C) = emptybindingEval( [ Arc = Expr ] , C) = _bind1(id, Eval( Expr , C))

Here id is the t text representationof Arc. The conversionfrom an Arc to at text is straightforward. If theArc is an Id, the literal charactersof the identifierbecomethetext value. If theArc is anInteger, theliteral charactersusedto repre-sentthe integer in thesourceof themodelbecomethe text value. If theArc is aText, theresultof Eval(Arc, C) is used.As definedin SectionA.3.4.5, bind1 (id,v) evaluatesto a singletonbindingthatassociatesthenon-emptyt text id with thevaluev.

The $( Expr) syntaxallows the nameintroducedinto a binding to be com-puted:

Eval( [ $ ( Expr_1 ) = Expr_2 ] , C) =_bind1(Eval(Expr_1, C), Eval( Expr_2 , C))

Whenthefield nameis computedusingthe$ syntax,theexpressionmusteval-uateto anon-emptyt text (seethe bind1 primitive in SectionA.3.4.5below).

Thefollowing rulehandlesthecasewheremultiple BindElem’s aregiven.

Eval( [ BindElem_1, ..., BindElem_n ] , C) ={

val b1 = Eval( [ BindElem_1 ] , C);val b2 = Eval( [ BindElem_2, ..., BindElem_n ] , C);return _append(b1, b2);

}

As definedin SectionA.3.4.5, append (b1, b2) evaluatesto theconcatenationof thebindingsb1 andb2; it requiresthattheir domainsaredisjoint.

175

Page 188: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

A.3.3.6 Select

Syntax:

Select ::= Primary Selector GenArcSelector ::= Delim | !GenArc ::= Arc | $ Id | $ ( Expr ) | % Expr %Arc ::= Id | Integer | Text

A Selectexpressiondenotesa selectionfrom a binding,sothePrimarymusteval-uateto abindingvalue.

SyntacticDesugarings:

PrimarySelector%Expr% desugarsto PrimarySelector$ ( Expr )

Evaluation Rules:

TheDelim syntaxselectsa valueoutof abindingby name.

Eval( Primary Delim Arc , C) =_lookup(Eval( Primary , C), id)

Hereid is thet text valueof Arc, asdefinedin SectionA.3.3.5above.The$( Expr) syntaxallows theselectednameto becomputed:

Eval( Primary Delim $ ( Expr ) , C) =_lookup(Eval( Primary , C), Eval( Expr , C))

The! syntaxtestswhetheranameis in abinding’s domain:

Eval( Primary ! Id , C) =_defined(Eval( Primary , C), id),

As definedin SectionA.3.4.5, defined (b, nm) evaluatesto true if nm is non-emptyandin b’s domain,andto falseotherwise.As above, the$( Expr) syntaxcanbeusedto computethename:

Eval( Primary ! $ ( Expr ) , C) =_defined(Eval( Primary , C), Eval( Expr , C))

In bothcaseswheretheGenArcis acomputedexpression,theExpr mustevaluateto a t text.

176

Page 189: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

A.3.3.7 Block

Syntax:

Block ::= ‘{’ Stmt*; Result; ‘}’Stmt ::= Assign | Iterate | FuncDef | TypeDefResult ::= { value | return } Expr

SyntacticDesugarings:

return Expr desugarsto value Expr

That is, the keywords return andvalue aresynonyms, provided for stylisticreasons.The return /value statementmustappearat theendof a Block; thereis noanalogof theC/C++returnstatementthatterminatesexecutionof thefunctionin which it appears.

Evaluation Rules:

SincetheVestaSDL is functional,evaluationof astatementdoesnotproduceside-effects,but ratherproducesabinding.Evaluationof ablockoccursby augmentingthecontext with thebindingsproducedby evaluatingtheStmts,thenevaluatingthefinal Expr in theaugmentedcontext.

Eval( { value Expr } , C) = Eval( Expr , C)

Eval( { Stmt_1; ...; Stmt_n; value Expr } , C) ={

val b = Eval( { Stmt_1; ...; Stmt_n } , C);return Eval( Expr , operator+(C, b));

}

Noticethat this secondrule introducesanargumentto Eval in the“extended”lan-guagethatis notgeneratedby any non-terminalof thegrammar.

A.3.3.8 Stmt

Evaluation Rules:

Evaluatinga Stmtor sequenceof Stmtsproducesa binding. Notethat thebindingresultingfrom theevaluationof asequenceof Stmtsis simplytheoverlay(operator‘+’) of thebindingsresultingfrom evaluatingeachStmtin thesequence,anddoesnot includethecontext C.

177

Page 190: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Eval( { } , C) = emptybinding

Eval( { Stmt_1; Stmt_2 ...; Stmt_n } , C) ={

val b0 = Eval( Stmt_1 , C);val b1 = Eval( { Stmt_2; ...; Stmt_n }, operator+(C, b0));return operator+(b0, b1);

}

Theserulesapplyto constructsin the“extended”language.Therearethreepossi-bilities for aStmt:Assign,Iterate,andFuncDef.They arecoveredin thenext threesections.

A.3.3.9 Assign

Sincethe VestaSDL is functional,assignmentsdo not produceside-effects. In-stead,they introduceanew nameinto theevaluationcontext whosevalueis thatofthegivenexpression.

Syntax:

Assign ::= Id [ TypeQual ] [ Op ] = ExprOp ::= AddOp | MulOpAddOp ::= + | ++ | -MulOp ::= *

SyntacticDesugarings:

Id Op = Expr desugarsto Id = Id Op Expr

Evaluation Rules:

Eval( Id = Expr , C) = _bind1(id, Eval( Expr , C))

A.3.3.10 Iterate

Thelanguageincludesexpressionsfor iteratingoverbothlistsandbindings.Thereis also a map primitive definedon lists (SectionA.3.4.4) and bindings (Sec-tion A.3.4.5). map is moreefficient but lessgeneralthanthe language’s Iterateconstruct.

178

Page 191: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Syntax:

Iterate ::= foreach Control in Expr do IterBodyControl ::= Id | ‘[’ Id = Id ‘]’IterBody ::= Stmt | ‘{’ Stmt+; ‘}’

Thetwo Controlformsareusedto iterateover lists andbindings,respectively.

Evaluation Rules:

// iteration with single-statement bodyEval( foreach Control in Expr do Stmt , C) =

Eval( foreach Control in Expr do { Stmt } , C)

Thesemanticsof a loop areto conceptuallyunroll theloop n times,wheren is thelengthof the list or bindingbeingiteratedover. Theevaluationrulesfor iteratingover lists andbindingsareshown in TableA.6. Note that the iterationvariables(that is, Id , Id1 , andId2 in theTable)arenot boundin thebinding that resultsfrom evaluatingthe foreach statement.However, any assignmentsmadein theloopbodyare includedin theresultbinding.

Iterationstatementsare typically usedto walk over or collect partsof a listor binding. For example,TableA.7 presentsfunctionsfor reversinga list andforcountingthenumberof leavesin abinding.

A.3.3.11 FuncDef

Syntax:

The function definition syntaxallows a suffix of the formal parametersto haveassociateddefault values.

FuncDef ::= Id Formals+ [ TypeQual ] BlockFormals ::= ( FormalArgs )FormalArgs ::= TypedId*,

| { TypedId = Expr }*,| TypedId { , TypedId }* { , TypedId = Expr }+

The threealternatives for FormalArgs correspondto the casesin which no ar-gumentsaredefaulted,all argumentsaredefaulted,andsomeargumentsarede-faulted,respectively.

Note that the syntaxallows multiple Formals to follow the function name.As the rulesbelow describe,theuseof multiple Formalsproducesa sequenceofcurriedfunctions,all but thefirst of which is anonymous.

179

Page 192: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

// iteration over a listEval( foreach Id in Expr do { Stmt_1; ...; Stmt_n } , C) ={

val l = Eval( Expr, C);if (_is_list(l) == false) error;t_text id = Id; // identifier Id as a t_textval r = emptybinding;for (; !(l == emptylist); l = _tail(l)) {

val r1 = operator+(C, r);r1 = operator+(r1, _bind1(id, _head(l)));r = operator+(r, Eval( { Stmt_1; ...; Stmt_n } , r1));

}return r;

}

// iteration over a bindingEval(foreach [Id1=Id2] in Expr do {Stmt_1;...;Stmt_n}, C) ={

val b = Eval( Expr, C);if (_is_binding(b) == false) error;t_text id1 = Id1; // identifier Id1 as a t_textt_text id2 = Id2; // identifier Id2 as a t_textval r = emptybinding;for (; !(b == emptybinding); b = _tail(b)) {

val r1 = operator+(C, r);r1 = operator+(r1, _bind1(id1, _n(_head(b))));r1 = operator+(r1, _bind1(id2, _v(_head(b))));r = operator+(r, Eval( { Stmt_1; ...; Stmt_n } , r1));

}return r;

}

TableA.6: Evaluationrules for iteratingover lists andbindings. Asdefinedin SectionA.3.4.7, is list (l) is true if l is of typet list, andfalseotherwise; is binding (b) is true if b is of type t binding,andfalseotherwise.

180

Page 193: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

reverse_list(l: list): list{

res: list = <>;foreach elt in l do

res = <elt> + res;return res;

}

count_leaves(b: binding): int{

res: int = 0;foreach [ nm = val ] in b do

res += if _is_binding(val)then count_leaves(val) else 1;

return res;}

TableA.7: Two functionsdemonstratingtheuseof foreach to iterateovera list anda binding.

Evaluation Rules:

Eval( Id Formals_1 ... Formals_n Block , C) =_bind1(id, Eval( e , C1)),

where:

s e = LAMBDAFormals 1 ... LAMBDAFormals n Block

s C1 = operator+(C, bind1(id, Eval( e , C1)))

Noticetherecursivedefinitionof C1. Thisallowsfunctionsto beself-recursive,but not mutually recursive. Althoughthis recursive definition looksa little odd,itcanbe implementedby the evaluatorby introducinga cycle into the context C1.This is the only casewhereany Vestavalue can containa cycle (the languagesyntaxandoperatorsdo not allow cyclic lists or bindingsto be constructed),andthecycle is invisible to clients. Thereis no practicaldifficulty in constructingthecycle because,as we are aboutto see,the “evaluation” of a LAMBDAis purelysyntactic.

Also notethat this rule producesa LAMBDAconstructin the “extended”lan-guagethatis notgeneratedby any non-terminalof thegrammar. TheevaluationofLAMBDAproducesa t closurevalue v ew f w bx asdescribedin SectionA.3.1.

181

Page 194: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Thefollowing is thesimplecaseof LAMBDA, whereall actualparametersmustbegivenin any applicationof theclosure.Thereasonfor therestrictionon theuseof “.” asa formal parameteris treatedbelow in thesectionon functioncalls.

Eval( LAMBDA(Id_1, ..., Id_m)LAMBDAFormals_2 ... LAMBDAFormals_n Block , C) =

<LAMBDAFormals_2 ... LAMBDAFormals_n Block, f, C>,

wheref is a list of pairs z idi wHv emptyExprx | suchthat idi is thet text representationof Id i, for i in the closedinterval [1 w m]. If any of the identifiersId i is “.”, theevaluationhaltswith a runtimeerror.

In thetypical casewhereonly onesetof Formalsis specified(that is, n ~ 1),thefirst elementof theresultingclosurevalueis simplyaBlock.

Next is thegeneralcaseof LAMBDA, in which “default expressions”aregivenfor asuffix of theformalparameterlist. Functionsmaybecalledwith feweractualsthanformalsif eachformal correspondingto anomittedactualincludesanexpres-sionspecifyingthedefault valueto becomputed.Whentheclosureis applied,ifanactualparameteris missing,its formal’s expressionis evaluated(in thecontextof the LAMBDA) andpassedinstead. The following sectionon FuncCalldefinesthisprecisely.

Eval(LAMBDA(Id_1,... Id_k, Id_k+1=Expr_k+1,... Id_m=Expr_m)LAMBDAFormals_2 ... LAMBDAFormals_n Block , C) =

<LAMBDAFormals_2 ... LAMBDAFormals_n Block, f, C>,

wheref is a list of pairs z idi w expri | suchthat:

s idi is thet text representationof Id i, for i in [1 w m],

s expri is v emptyExprx , for i in [1 w k], and

s expri is Expr i, for i in [k � 1 w m].

As before,if any of the identifiersId i is “.”, the evaluationhaltswith a runtimeerror.

A.3.3.12 FuncCall

Syntax:

FuncCall ::= Primary ActualsActuals ::= ( Expr*, )

182

Page 195: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Evaluation Rules:

Thefunctioncall mechanismprovidesspecialtreatmentfor the identifierconsist-ing of a singleperiod,called the current environmentandpronounceddot. Dotis typically assigneda binding that containsthe tools, switches,andfile systemrequiredfor the rest of the build. The initial environment, C-initial (seeSec-tion A.3.3 above), doesnot bind dot (that is, “ defined( C-initial, ".")== false ”).

Whena functionis called,thecontext in which its bodyexecutesmaybind “.”to avalueestablishedasfollows:

s if the function is definedwith n formalsandcalledwith n or fewer actuals,then the value for “.” at the point of call is boundto the implicit formalparameternamed“.” in thecallee;

s if thefunctionis definedwith n formalsandcalledwith n � 1 actuals,thenthevalueboundto the implicit formal parameternamed“.” is thevalueofthelastactual.

Thus,thebindingfor “.”, if any, is passedthroughthedynamiccall chainuntilit is alteredeitherexplicitly by anAssignstatement(SectionA.3.3.9)or implicitlyby calling a function with an extra actualparameter. The pseudo-codeshown inTableA.8 makesthis precise.In this code,thecomparisonwith v emptyExprx hasnotbeenformalized,but it shouldbeintuitively clear.

A.3.3.13 Model

Syntax:

Model ::= Files Imports Block

Evaluation Rules:

ThenonterminalModel is treatedlike thebody of a functiondefinition (i.e., likea FuncDef(SectionA.3.3.11),but without the identifier namingthe functionandwith anemptylist of formalparameters).More precisely:

Eval( Files Imports Block , C) ={

val C0 = Eval( Files Imports , emptybinding);return Eval( LAMBDA() Block , _append(C0, C));

}

183

Page 196: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Eval( Primary ( Expr_1, ..., Expr_n ) , C) ={

val cl = Eval( Primary , C);if (_is_closure(cl) == false) error;

/* cl.e is the function body, cl.f are the formals, andcl.b is the context */

int m = _length(cl.f); // number of formalsif (n > m + 1) error; // too many actualsval C1 = cl.b; // t_bindingval f = cl.f; // t_list (of (t, e) pairs)

/* augment C1 to include formals bound to correspondingactuals */

int i;for (i = 1; i <= m; i++) {

val form = _head(f); // i-th formalval act; // corresponding actualif (i <= n)

act = Eval( Expr_i , C); // value for i-th actualelse {

if (form.e == <emptyExpr>) {// a required actual is missingerror;

}act = Eval( form.e , cl.b); // defaulted formal value

}C1 = operator+(C1, _bind1(form.t, act));f = _tail(f);

}

// bind "." in C1val dot;if (n <= m)

dot = _lookup(C, "."); // inherit "." from Celse

dot = Eval( Expr_n , C); // last actual value suppliedC1 = operator+(C1, _bind1(".", dot));

/* C1 is now a suitable environment. If the closure isa primitive function, then invoke it by a specialmechanism internal to the evaluator and return thevalue it computes. Otherwise, perform the following:

*/return Eval( cl.e , C1);

}

TableA.8: Evaluationrule for FuncCall.

184

Page 197: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

As this rule indicates,the Files and Imports constructsare evaluatedin anemptycontext, andthey addto theclosurecontext in which themodel’s LAMBDAis evaluated.In practice,thecontext C will alwaysbe the initial context C-initialwhenthis rule is applied(cf. SectionsA.3.3 andA.3.3.15).

The Files nonterminalintroducesvaluescorrespondingto the contentsof or-dinary files and directories. The Imports nonterminalintroducesclosurevaluescorrespondingto otherVestaSDL models.

TheevaluationruleshandleFilesandImportsclausesby augmentingthecon-text usingthe append primitive, therebyensuringthat thenamesintroducedbytheseclausesareall distinct, just asif theFilesandImportsclausesof theModelwere a single binding constructor. The Files and Imports clausesare evaluatedindependently:

Eval( Files Imports , C) =_append(Eval( Files , C), Eval( Imports , C))

The following two sectionsgive the rules for evaluatingFiles and Importsclausesindividually. It is worth noting that the evaluationcontext C is ignoredin thoserules.

A.3.3.14 Files

A Filesclauseintroducesnamescorrespondingto files or directoriesin theVestarepository. Generally, thesefiles or directoriesarenamedby relative paths,whichare interpretedrelative to the location of the model containingthe Files clause.Absolutepathsarepermitted,thoughthey areexpectedto berarelyused.

Syntax:

Files ::= FileClause*FileClause ::= files FileItem*;FileItem ::= FileSpec | FileBindingFileSpec ::= [ Arc = ] DelimPathFileBinding ::= Arc = ‘[’ FileSpec*, ‘]’

DelimPath ::= [ Delim ] Path [ Delim ]Path ::= Arc { Delim Arc }*Arc ::= Id | Integer | Text

EachFileItemin aFilesclausetakesoneof two forms:aFileSpecoraFileBind-ing. Eachform introduces(binds)exactlyonename.In theFileSpeccase,thenamecorrespondsto thecontentsof asinglefile or directory;in theFileBindingcase,the

185

Page 198: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

namecorrespondsto a bindingconsistingof perhapsmany files or directories.Inbothcases,theidentifierintroducedinto theVestanamingcontext or theidentifiersintroducedinto the binding canbe specifiedexplicitly or derived from an Arc inthePath.

For example,considerthefollowing files clause:

filesscripts = bin;c_files = [ utils.c, main.c ];

Supposethedirectorycontainingthis modelalsocontainsa directorynamedbinandfiles namedutils.c andmain.c . Thenthis files clauseintroducesthetwo namesscripts andc files into the context. The former is boundto abindingwhosestructurecorrespondsto thebin directory. Thelatteris boundto abindingthatmapsthenamesutils.c andmain.c to thecontentsof thosefiles,respectively. Thefile contentsarevaluesof typet text.

SyntacticDesugaring:

Whenmultiple FileItem’s aregiven in a FileClause,the files keyword simplydistributesovereachof theFileItem’s. Thatis:

files FileItem_1; ...; FileItem_n;

desugarsto:

files FileItem_1;...;files FileItem_n;

Whentheinitial Arc is omittedfrom aFileSpec,it is inferredfrom thepath.Inparticular:

files [ Delim ] { Arc Delim }* Arc [ Delim ];

desugarsto:

files Arc = [ Delim ] { Arc Delim }* Arc [ Delim ];

186

Page 199: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Evaluation Rules:

Multiple FileClausesareevaluatedindependently:

Eval( FileClause_0 FileClause_1 ... FileClause_n , C) ={

val C2 = Eval( FileClause_1 ... FileClause_n , C);return _append(Eval( FileClause_0 , C), C2);

}

That leavesonly two casesto consider:FileSpec(in which the initial Arc isspecified)andFileBinding.

// FileSpecEval( files Arc = DelimPath , C) = _bind1(id, v)

where:

s id is thet text representationof Arc, asdefinedin SectionA.3.3.5above.

s If DelimPathbeginswith aDelim, it is interpretedasanabsolutepath,whichmust neverthelessresolve to a file or directory in the Vestarepository. IfDelimPathdoesnotbegin with aDelim, it refersto afile or directorynamedrelative to thedirectoryof theenclosingModel.

s If the entity namedby DelimPath is a file, v is a t text value formed bytaking the file’s contents.If DelimPath namesa directory, v is a t bindingvalue constructedfrom the contentsof the directory, treating the files (ifany) in the directoryasabove (i.e., ast text values)andthe directories(ifany) recursively (i.e., asbindings). The membersof the resultingbindingarein an unspecifiedorder. If DelimPath doesnot correspondto eitheranextantfile or adirectory, theevaluationhaltswith a runtimeerror.

// FileBindingEval( files Arc = [ FileSpec_1, ..., FileSpec_n ] , C) =

_bind1(id, Eval( files FileSpec_1; ...; FileSpec_n , C))

Again, id is thet text representationof Arc.The FileBinding form of the Files clauseprovides a convenientway to cre-

atea binding containingmultiple FileSpecs.Without this construct,it would benecessaryto nameeachfile twice, oncein theFileSpecandoncein a subsequentbindingconstructor. Making a bindingwith FileBindingis semanticallysimilar toconstructinga file systemdirectory, with the additionalpropertythat thereis anenumerationorderfor thecomponentfiles.

187

Page 200: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Notice that the grammarand evaluationrules given above for FileSpecandFileBindingallow a generalArc on the left-handsideof eachequalsign,not justan Id. This wasdoneto simplify the definitionsanddesugaringrules. However,it would beuselessto write constructslike the following, which introducenamesthatcannotbereferencedin thebodyof themodel:

files33;34 = 34;"hash-table.c";"foo bar" = [ foo, bar ];

Therefore,weintroduceanadditionalrestriction:thecontext createdby aFilesclausemustbindonly namesthatarelegal identifiers;thatis, namesthatmatchthesyntaxof theId token.

If youneedto usefileswhosenamesarenot legal identifiers,youshouldeitherassignthemlegal nameswith the equalsign syntaxor embedthemin a binding.Somepossibilities:

// Choose a legal namefiles

f33 = 33;f34 = 34;hash_table.c = "hash-table.c";foo_bar = [ foo, bar ];

// Embed in a bindingfiles

f = [ 33, 34 ];src = [ "hash-table.c" ];

A.3.3.15 Imports

TheImportsclauseenablesoneVestaSDL modelto referenceanduseothers;thatis, it supportsmodulardecompositionof VestaSDL programs.

Syntax:

Imports ::= ImpClause*ImpClause ::= ImpIdReq | ImpIdOpt

Thereare two major forms of the Imports clause: one whereidentifiersarerequired(ImpIdReq),andonewherethey areoptional (ImpIdOpt). Both forms

188

Page 201: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

have two sub-formsin which either a single model or a list of modelsmay beimported.

First,considertheImpIdReqcase.Thisform is typically usedto importmodelsin the samepackageas the importing model. EachImpItemRin the ImpIdReqclausetakes oneof two forms: an ImpSpecRor an ImpListR. Eachform bindsexactlyonename.

ImpIdReq ::= import ImpItemR*;ImpItemR ::= ImpSpecR | ImpListRImpSpecR ::= Arc = DelimPathImpListR ::= Arc = ‘[’ ImpSpecR*, ‘]’

DelimPath ::= [ Delim ] Path [ Delim ]Path ::= Arc { Delim Arc }*Arc ::= Id | Integer | Text

In the ImpSpecRcase,the nameis boundto the t closurevalue that resultsfrom evaluationof thecontentsof afile accordingto theModelevaluationrulesofSectionA.3.3.13.For example,considertheImport clause:

import self = progs.ves;

Thisclausebindsthenameself to theclosurecorrespondingto thelocalprogs.vesmodelin thesamedirectoryasthemodelin which it appears.

In the ImpListR case,the nameis boundto a binding of suchvalues. Forexample:

import sub =[ progs = src/progs.ves, tests = src/tests.ves ];

This clausebindsthe namesub to a binding containingthe namesprogs andtests ; thesenameswithin thebindingareboundto theclosurescorrespondingtothemodelsnamedprogs.vesandtests.vesin thepackage’s src subdirectory. Forexample,theprogs.vesmodelcouldbeinvokedby writing “sub/progs() ”.

BecausetheImportsclauseoftenmentionsseveralfiles with namesthatsharea commonprefix, a syntacticform is provided to allow the prefix to be writtenonce.This is theImpIdOptform. It is usedto importmodelsfrom otherpackages.Thesemanticsaredefinedsothatmany identifiersareoptional;whenomitted,theydefault to thenameof thepackagefrom which themodelis beingimported.As intheImpIdReqcase,ImpIdOpthasformsfor importingbothsinglemodelsandlistsof multiple models.

189

Page 202: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

ImpIdOpt ::= from DelimPath import ImpItemO*;ImpItemO ::= ImpSpecO | ImpListOImpSpecO ::= [ Arc = ] Path [ Delim ]ImpListO ::= Arc = ‘[’ ImpSpecO*, ‘]’

Herearesomeexamplesof ImpIdOptimports:

from /vesta/west.vestasys.org/vesta importcache/12/build.ves;libs = [ srpc/2/build.ves, basics/5/build.ves ];

This examplebindsthenamecache to theclosurecorrespondingto version12 of that package’s build.vesmodel,and it binds the namelibs to a bindingcontainingthe namessrpc and basics , boundto versions2 and 5 of thosepackage’s build.vesmodels. (As the evaluationrules below describe,the threeoccurrencesof “ /build.ves ” in thisexamplecouldactuallyhavebeenomitted.)

SyntacticDesugaring:

Whenmultiple ImpItemR’s aregiven in a ImpIdReq,the import keyword dis-tributesovereachof theImpItemR’s. Thatis:

import ImpSpec_1; ...; ImpSpec_n;

desugarsto:

import ImpSpec_1;...;import ImpSpec_n;

Similarly, the from clausedistributesover theindividual importsof anImpI-dOpt. In particular:

from DelimPath import ImpItemO_1; ...; ImpItemO_n;

desugarsto:

from DelimPath import ImpItemO_1;...;from DelimPath import ImpItemO_n;

The useof from makes it optional to supply a namefor the closurevaluebeingintroduced;if thenameis omitted,it is derived from thePathfollowing theimport keyword asfollows:

190

Page 203: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

from DelimPath import[ Arc_1 = ] [ Delim ] Arc_2 { Delim Arc }* [ Delim ]

desugarsto:

import Arc =DelimPath Delim Arc_2 { Delim Arc }* [ Delim ]

whereArc is Arc1 if it is presentandis Arc2 otherwise.Similarly:

from DelimPath import Arc = [[ Arc1_1 = ] [ Delim ] Arc2_1 { Delim Arc }* [ Delim ],...,[ Arc1_n = ] [ Delim ] Arc2_n { Delim Arc }* [ Delim ] ]

desugarsto:

import Arc = [Arc_1 = DelimPath Delim Arc2_1 {Delim Arc }* [ Delim ],...,Arc_n = DelimPath Delim Arc2_n {Delim Arc }* [ Delim ] ]

whereArci is Arc1i if it is presentandis Arc2i otherwise.

Evaluation Rules:

Multiple ImpClause’s areevaluatedindependently:

Eval( ImpClause_0 ImpClause_1 ... ImpClause_n , C) ={

val C2 = Eval( ImpClause_1 ... ImpClause_n , C);return _append(Eval( ImpClause_0 , C), C2);

}

This leaves two fundamentalforms of the Imports clause,whosesemanticsaredefinedasfollows:

// ImpSpecREval( import Arc = DelimPath , C) =

_bind1(id, Eval( model , C-initial))

where:

s id is thet text representationof Arc, asdefinedin SectionA.3.3.5above.

191

Page 204: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

s Let f bethesequenceof DelimsandArcs thatconstitutetheDelimPath.

1. If f doesnot begin with a Delim, prepend“Delim Path0 Delim ”to f, wherePath0namesthedirectorycontainingthe Model in whichthis Importsclauseappears.

2. Look up thepath f in the Vestarepository. (SeeSectionA.3.3.16be-low.) If f namesadirectory, appendaDelim (if f doesn’t alreadyendinone)andthestring “build.ves ”, thenlook up theaugmentedpathf in the repositoryagain. If f doesnot namea directoryand its finalelementdoesnotendin “ .ves ”, appendthestring“ .ves ” to thefinalelementof f, andlook it up in therepositoryagain.

s modelis theVestaSDL Model representedby thecontentsof thefile in theVestarepositorynamedby the sequencef. If no suchexpressioncan beproduced(e.g., the file doesn’t exist, or can’t be parsedasan expression),evaluationhaltswith a runtimeerror.

// ImpListREval( import Arc = [ ImpSpecR_1, ..., ImpSpecR_n ] , C) =

_bind1(id, Eval( import ImpSpecR_1; ...; ImpSpecR_n , C))

Again, id is thet text representationof Arc.As with theFilesclause,andfor thesamereason,weaddonerestrictionto the

rules just given: the context createdby an Importsclausemustbind only namesthatarelegal identifiers;thatis, namesthatmatchthesyntaxof theId token.

A.3.3.16 FilenameInter pretation

Theevaluationrulesfor theFilesandImportsclausesdo not specifyhow these-quenceof ArcsandDelimsmakingup aDelimPathis convertedinto afilenameintheunderlyingfile system.While this is somewhatsystem-dependent,it is never-thelessintendedto beintuitive. In particular,

s Multiple adjacentDelimsarereplacedby asingleone.(Thegrammarabovedoesn’t permitadjacentDelims,but they canbeproducedby thedesugaringrules.)

s The VestaSDL syntaxallows the arbitrary intermingling of “ / ” and “ \ ”asarcseparators.However, theimplementationactuallyrequiresthatVestaprogramsuseoneor theotheruniformly. Whencreatinga filenamefrom asequenceof ArcsandDelims,theimplementationinsertstheappropriatearcseparatorrequiredby theunderlyingfile system.Thechoiceisnotinfluencedby thechoiceof Delim thatappearsin theVestaSDL program.

192

Page 205: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

s ThegrammarpermitsanArc to beanarbitraryText. An Arc in a filename,however, is forbiddento containa Delim character(i.e., forward or back-wardslash),andtheArcs “ .. ” and“ . ” areforbiddenin filenamesaswell.In particular, “ .. ” cannotbe usedto meanparent directoryand“ . ” can-not beusedto meancurrentdirectory. The “ .. ” notationis forbiddenfortechnicalreasonsrelatedto Vestacaching,while the “ . ” notationis sim-ply unimplemented.However, theemptyArc “” canbe usedto denotethecurrentdirectory.

A.3.4 Primiti ves

The primitive namesandassociatedvaluesdescribedbelow areprovided by theVestaSDL interpreterin C-initial, the initial context. Most of thesevaluesareclosureswith emptycontexts; thatis, they areprimitive functions.

In the descriptionsthat follow, the notationusedfor the function signaturesfollows C++, with theresulttypeprecedingthefunctionnameandeachargumenttype precedingthe correspondingargumentname. Defaulting conventionsalsofollow C++; if an argumentnameis followed by “= value”, thenomitting thecorrespondingactualargumentis equivalentto supplyingvalue.

Someof thefunctionsignaturesusetheC++ operatordefinitionsyntax,whichshouldbeunderstoodasdefininga functionwhosenameis not an Id in thesenseof thegrammarabove. Suchoperatornamescannotberebound.Theseoperatorsaretypically overloaded,asthedescriptionsbelow indicate.Usesof thesebuilt-inVestaprimitiveswithin C++ codearedenotedby theoperator syntax.

Thepseudo-codeof thissectionassumesthedefinitionof theVestavalueclassgivenat thestartof SectionA.3.3. Invocationof a Vestaoperatorprimitive withinthepseudo-codeis denotedby theoperator syntax.All otheroperatorsappear-ing in thepseudo-codedenotetheC++ operators.

In thesedescriptions,theargumenttypesrepresentthenaturaldomain;there-sult typeis thenaturalrange.If aprimitive functionis passedavaluethatliesout-sideits naturaldomain,evaluationhaltswith a runtimeerror. This type-checkingoccurswhentheprimitive functionis called,notbefore.

A.3.4.1 Functionson Type t bool

RecallthattrueandfalseareVestavalues,notC++ quantities.

t_booloperator==(t_bool b1, t_bool b2)

Returnstrue if b1andb2 arethesame,andfalseotherwise.

193

Page 206: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

t_booloperator!=(t_bool b1, t_bool b2)

operator!(operator==(b1, b2))

t_booloperator!(t_bool b)

Returnsthelogical complementof b.

A.3.4.2 Functions on Type t int

t_booloperator==(t_int i1, t_int i2)

Returnstrue if i1 andi2 areequal,andfalseotherwise.

t_booloperator!=(t_int i1, t_int i2) =

operator!(operator==(i1, i2))

t_intoperator+(t_int i1, t_int i2)

Returnsthe integer sum i1 + i2 unlessit lies outsidethe implementation-definedrange,in whichcaseevaluationhaltswith a runtimeerror.

t_intoperator-(t_int i1, t_int i2)

Returnsthe integer differencei1 - i2 unlessit lies outsidethe implementation-definedrange,in whichcaseevaluationhaltswith a runtimeerror.

t_intoperator-(t_int i) =

operator-(0, i)

t_intoperator*(t_int i1, t_int i2)

Returnstheintegerproducti1 * i2 unlessit liesoutsidetheimplementation-definedrange,in whichcaseevaluationhaltswith a runtimeerror.

t_int_div(t_int i1, t_int i2)

Returnsthe integerquotienti1 / i2 (that is, thefloor of the realquotient)unlessitlies outsidetheimplementation-definedrange,in which caseevaluationhaltswitha runtimeerror. This error is possibleonly if i2 is zeroor if i2 is -1 and i1 is thelargestimplementation-definednegative number.

194

Page 207: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

t_int_mod(t_int i1, t_int i2) =

operator-(i1, operator*(_div(i1,i2), i2))

t_booloperator<(t_int i1, t_int i2)

Returnstrue if andonly if i1 is lessthani2.

t_booloperator>(t_int i1, t_int i2) =

operator<(i2, i1)

t_booloperator<=(t_int i1, t_int i2)

Returnstrue if andonly if i1 is at mosti2.

t_booloperator>=(t_int i1, t_int i2) =

operator<=(i2, i1)

t_int_min(t_int i1, t_int i2) ={ if (operator<(i1, i2)) return i1; else return i2; }

t_int_max(t_int i1, t_int i2) ={ if (operator>(i1, i2)) return i1; else return i2; }

A.3.4.3 Functionson Type t text

Thefirst byteof a t text valuehasindex 0.

t_booloperator==(t_text t1, t_text t2)

Returnstrue if t1 andt2 areidenticalbytesequences,andfalseotherwise.

t_booloperator!=(t_text t1, t_text t2) =

operator!(operator==(t1, t2))

t_textoperator+(t_text t1, t_text t2)

Returnsthebytesequenceformedby appendingthebyte sequencet2 to thebytesequencet1 (concatenation).

195

Page 208: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

t_int_length(t_text t)

Returnsthenumberof bytesin thebytesequencet.

t_text_elem(t_text t, t_int i)

If 0 � i � length( t) , returnsa bytesequenceof length1 consistingof byte iof thebytesequencet. Otherwise,returnstheemptybytesequence.

t_text_sub(t_text t, t_int start = 0, t_int len = _length(t)) ={

int w = _length(t);int i = _min(_max(start, 0)), w);int j = _min(i + _max(len, 0), w);// 0 <= i <= j <= _length(t); extract [i..j)t_text r = "";for (; i < j; i++) r = operator+(r, _elem(t, i));return r;

}

Extractsfrom t andreturnsa bytesequenceof length len beginning at byte start.Note the boundarycasesdefinedby the pseudo-code;sub producesa runtimeerroronly if it is passedargumentsof thewrongtype.

t_int_find(t_text t, t_text p, t_int start = 0) ={

int j = _length(t) - _length(p);if (j < 0) return -1;int i = _max(start, 0);if (i > j) return -1;for (; i <= j; i++) {

int k = 0;while (k < _length(p) &&

_elem(t, i+k) == _elem(p, k)) k++;if (k == _length(p)) return i;

}return -1;

}

Findstheleftmostoccurrenceof p in t thatbeginsator afterpositionstart. Returnstheindex of thefirst byteof theoccurrence,or -1 if noneexists.

t_int_findr(t_text t, t_text p, t_int start = 0) =

196

Page 209: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

{int j = _length(t) - _length(p);if (j < 0) return -1;int i = _max(start, 0);if (i > j) return -1;for (; i <= j; j--) {

int k = 0;while (k < _length(p) &&

_elem(t, j+k) == _elem(p, k)) k++;if (k == _length(p)) return j;

}return -1;

}

Findsthe rightmostoccurrenceof p in t thatbeginsat or afterpositionstart. Re-turnstheindex of thefirst byteof theoccurrence,or -1 if noneexists.

A.3.4.4 Functionson Type t list

t_booloperator==(t_list l1, t_list l2)

Returnstrue if l1 andl2 arelists of thesamelengthcontaining(recursively) equalvalues,andfalseotherwise.

t_booloperator!=(t_list l1, t_list l2) =

operator!(operator==(l1, l2))

t_list_list1(t_value v)

Returnsa list containingasingleelementwhosevalueis v.

t_value_head(t_list l)

Returnsthefirst elementof l. If l is empty, evaluationhaltswith a runtimeerror.

t_list_tail(t_list l)

Returnsthe list consistingof all elementsof l, in order, except the first. If l isempty, evaluationhaltswith a runtimeerror.

t_int_length(t_list l)

Returnsthenumberof (top-level) valuesin thelist l.

197

Page 210: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

t_value_elem(t_list l, t_int i)Returnsthe i-th valuein the list l. If no suchvalueexists,evaluationhaltswith aruntimeerror. Thefirst valueof a list hasindex 0.

t_listoperator+(t_list l1, t_list l2)Returnsthelist formedby appendingl2 to l1.

t_list_sub(t_list l, t_int start = 0, t_int len = _length(l)){

int w = _length(l);int i = _min(_max(start, 0)), w);int j = _min(i + _max(len, 0), w);// 0 <= i <= j <= _length(l); extract [i..j)t_list r = emptylist;for (; i < j; i++) r = operator+(r, _elem(l, i));return r;

}Returnsthesub-listof l of lengthlen startingat elementstart. Notetheboundarycasesdefinedby thepseudo-code;sub producesaruntimeerroronly if it is passedargumentsof thewrongtype.

t_list_map(t_closure f, t_list l) ={

t_list res = emptylist;for (; !(l == emptylist); l = _tail(l)) {

t_value v = f(_head(l)); // apply the closure "f"res = operator+(res, v);

}return res;

}Returnsthelist thatresultsfrom applyingtheclosuref to eachelementof thelist l,andconcatenatingtheresultsin order. Theclosuref shouldtake onevalue(of typet value)asargumentandreturna valueof any type. If f hasthewrongsignature,theevaluationhaltswith a runtimeerror.

t_list_par_map(t_closure f, t_list l)Formallyequivalentto map, but theimplementationmayperformeachapplicationof f in aseparateparallelthread.Externaltoolsinvokedby run tool in differentthreadsmayberunsimultaneouslyondifferentmachines.If aruntimeerroroccursin onethread,theotherthreadsmaystill run to completionbeforetheevaluationterminates.

198

Page 211: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

A.3.4.5 Functionson Type t binding

t_booloperator==(t_binding b1, t_binding b2)

Returnstrue if b1 and b2 are bindingsof the samelength containingthe samenames(in order)boundto (recursively) equalvalues,andfalseotherwise.

t_booloperator!=(t_binding b1, t_binding b2) =

operator!(operator==(b1, b2))

t_binding_bind1(t_text n, t_value v)

If n is not empty, returnsa bindingwith thesingle(name,value)pair (n, v). If n isempty, theevaluationhaltswith a runtimeerror.

t_binding_head(t_binding b)

Returnsa bindingwith one(name,value)pair equalto thefirst elementof b. If bis empty, theevaluationhaltswith a runtimeerror.

t_binding_tail(t_binding b)

Returnsthebindingconsistingof all elementsof b, in order, exceptthefirst. If b isempty, theevaluationhaltswith a runtimeerror.

t_int_length(t_binding b)

Returnsthenumberof (name,value)pairsin b.

t_binding_elem(t_binding b, t_int i)

Returnsa bindingconsistingsolelyof the i-th (name,value)pair in thebindingb.If no suchpair exists, theevaluationhaltswith a runtimeerror. Thefirst pair of abindinghasindex 0.

t_text_n(t_binding b)

If length(b) = 1, returnsthenamepartof the(name,value)pair thatconsti-tutesb. Otherwise,theevaluationhaltswith a runtimeerror.

t_value_v(t_binding b)

199

Page 212: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

If length( b) differs from 1, the evaluationhaltswith a runtime error. Oth-erwise,let v be the valuepart of the (name,value)pair that constitutesb. Thisfunctionreturnsv.

t_bool_defined(t_binding b, t_text name)

If nameis empty, theevaluationhaltswith aruntimeerror. Otherwise,thefunctionreturnstrue if thebindingb containsapair(n, v) with n identicalto name, andfalseotherwise.

t_value_lookup(t_binding b, t_text name)

If nameis nonemptyandis definedin b, returnsthevalueassociatedwith it; other-wise,theevaluationhaltswith a runtimeerror.

t_binding_append(t_binding b1, t_binding b2)

Returnsa binding formedby appendingb2 to b1, but only if all the namesin b1andb2 aredistinct.Otherwise,theevaluationhaltswith a runtimeerror.

t_bindingoperator+(t_binding b1, t_binding b2) ={

val r = emptybinding;for (; !(b1 == emptybinding); b1 = _tail(b1)) {

val n = _n(_head(b1));val v;if (_defined(b2, n) == true)

v = _lookup(b2, n);else v = _v(_head(b1));r = _append(r, _bind1(n, v));

}for (; !(b2 == emptybinding); b2 = _tail(b2)) {

if (_defined(b1, _n(_head(b2)) == false)r = _append(r, _head(b2));

}return r;

}

Returnsa binding formedby appendingb2 to b1, giving precedenceto b2 whenbothb1andb2 contain(name,value)pairswith thesamename.

t_bindingoperator++(t_binding b1, t_binding b2) ={

200

Page 213: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

val r = emptybinding;for (; !(b1 == emptybinding); b1 = _tail(b1)) {

val n = _n(_head(b1));val v;if (_defined(b2, n) == true) {

val v2 = _lookup(b2, n);if (_is_binding(v2) == true) {

v = _v(_head(b1);if (_is_binding(v) == true)

v = operator++(v, v2);else v = v2;

}else v = v2;

}else v = _v(_head(b1));r = _append(r, _bind1(n, v));

}for (; !(b2 == emptybinding); b2 = _tail(b2)) {

if (_defined(r, _n(_head(b2)) == false)r = _append(r, _head(b2));

}return r;

}Similar to operator+,but performstheoperationrecursively for eachnamen thatis associatedwith abindingvaluein bothb1andb2.

t_bindingoperator-(t_binding b1, t_binding b2) ={

val r = emptybinding;for (; !(b1 = emptybinding); b1 = _tail(b1)) {

val n = _n(_head(b1));if (_defined(b2, n) == false)

r = _append(r, _head(b1));}return r;

}

Returnsa bindingformedby removing from b1 any pair (n, v) suchthat thenamen is definedin b2. Thevaluev associatedwith n in b2 is irrelevant.

t_binding_sub(t_binding b, t_int start = 0, t_int len = _length(b)){

int w = _length(b);int i = _min(_max(start, 0)), w);

201

Page 214: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

int j = _min(i + _max(len, 0), w);// 0 <= i <= j <= _length(b); extract [i..j)t_binding r = emptybinding;for (; i < j; i++) r = _append(r, _elem(b, i));return r;

}Returnsthesub-bindingof b of lengthlenstartingatelementstart. Notethebound-ary casesdefinedby thepseudo-code;sub producesa runtimeerroronly if it ispassedargumentsof thewrongtype.

t_binding_map(t_closure f, t_binding b) ={

t_binding res = emptybinding;for (; !(b == emptybinding); b = _tail(l)) {

// apply the closure "f"t_binding b1 = f(_n(_head(b)), _v(_head(b)));res = _append(res, b1);

}return res;

}Returnsthebindingthat resultsfrom applyingtheclosuref to each(name, value)pair of thebindingb, andappendingtheresultingbindingstogether. Theclosurefshouldtake thename(of typet text) andvalue(of typet value)asarguments,andreturna valueof type t binding. If f hasthewrongsignature,theevaluationhaltswith a runtimeerror.

t_binding_par_map(t_closure f, t_binding b)Formallyequivalentto map, but theimplementationmayperformeachapplicationof f in aseparateparallelthread.Externaltoolsinvokedby run tool in differentthreadsmayberunsimultaneouslyondifferentmachines.If aruntimeerroroccursin onethread,theotherthreadsmaystill run to completionbeforetheevaluationterminates.

A.3.4.6 SpecialPurposeFunctions

t_closure _selfUnlessredefined,thename self alwaysrefersto themodelin which it textuallyoccurs. In effect, every model imports itself underthis name,prior to the firstimport clausethatappearsexplicitly in theSDL programtext.

t_text_model_name(t_closure m)

202

Page 215: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Thevaluemmustbeamodel;thatis, aclosuredefinedby importinganimmutablefile from the Vestarepository. model name returnsa text valuethat givesonenamefor mwithin therepository. If amodelwith identicalcontentsin anidenticaldirectoryis presentatseverallocationsin therepository, thenamereturnedmaybethatof any of theselocations,notnecessarilytheonethatwasactuallyimportedinthecurrentevaluation.

t_text_fingerprint(t_value v)

The fingerprint primitive returnsa text representationof the given value’sfingerprint, a 128-bit internal identifier for the value. Fingerprintsarechosensothatwith very high probability, two differentvalueswill alwayshave differentfin-gerprints.A givenvaluemayhave differentfingerprintsin differentevaluationsorwhencomputedat differentpointsin thesameevaluation,but theimplementationtriesto avoid thiswhenpractical.

Specifically, a sourcewith a particularabsolutenamein the Vestarepositoryalwayshasthesamefingerprint,while two sourceswith differentnamesbut withthesamevaluewill have thesamefingerprintif they werefingerprintedby contentwheninsertedinto therepository. Seethedocumentationof thevadvanceprogramfor detailson whensourcesarefingerprintedby nameandwhenby content.A de-rivedvaluereturnedby any Vestaprimitiveotherthan run tool hasafingerprintthatdependsdeterministicallyon thefingerprintsof its arguments.Derivedvaluesreturnedby run tool have eitherarbitraryuniquefingerprintsor deterministiccontent-basedfingerprints;seeSectionA.3.4.8for details.

Thesepropertiesmake fingerprintsusefulasversionstampsfor Vestaevalu-ations,sometimesmoreuseful than model name. If m1, m2 aremodelswithidenticalcontentsthat residein identicaldirectories,then fingerprint(m1)= fingerprint(m2 ) will oftenbetrueevenwhen model name(m1) !=model name(m2) .

A.3.4.7 TypeManipulation Functions

t_text_type_of(t_value v)

Returnsatext valuecorrespondingto thetypeof thevaluevasshown in TableA.9.

t_bool_same_type(t_value v1, t_value v2) =

operator==(_type_of(v1), _type_of(v2))

t_bool_is_bool(t_value v)

203

Page 216: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Value Text returnedby type oftrue,false “t bool”integer “t int”bytesequence “t text”err “t err”list “t list”binding “t binding”closures “t closure”

TableA.9: Text valuesreturnedby the type of primitive for eachpossibleinputvalue.

Returnstrue if v is of typet bool; returnsfalseotherwise.

t_bool_is_int(t_value v)

Returnstrue if v is of typet int; returnsfalseotherwise.

t_bool_is_text(t_value v)

Returnstrue if v is of typet text; returnsfalseotherwise.

t_bool_is_err(t_value v)

Returnstrue if v is of typet err; returnsfalseotherwise.

t_bool_is_list(t_value v)

Returnstrue if v is of typet list; returnsfalseotherwise.

t_bool_is_binding(t_value v)

Returnstrue if v is of typet binding;returnsfalseotherwise.

t_bool_is_closure(t_value v)

Returnstrue if v is of typet closure;returnsfalseotherwise.

204

Page 217: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

A.3.4.8 Tool Invocation Function

t_binding_run_tool(

t_text platform,t_list command,t_text stdin = "",t_text stdout_treatment = "report",t_text stderr_treatment = "report",t_text status_treatment = "report_nocache",t_text signal_treatment = "report_nocache",t_int fp_content = -2,t_text wd = ".WD",t_bool existing_writable = FALSE)

run tool is themechanismby whichexternalprogramslikecompilersandlink-ersareexecutedfrom a VestaSDL program.It providesfunctionalitythat is fairlyplatform-independent.The following description,however, is somewhat Unix-specific(for example,in its descriptionof exit codesandsignals).

The platform argumentspecifiesthe platform on which the tool is to be exe-cuted. run tool selectsa specificmachinefor the given platform. The legalvaluesfor platform and the mechanismby which a machineof the appropriateplatformis chosenareimplementationdependent.

Thetool to beexecutedis specifiedby thecommandargument.This argumentis a t list of t text values. The first memberof the list is the nameof the tool(interpretationof thenameis discussedbelow); theremainingmembersof thelistaretheargumentspassedto thetool asits commandline. Thetool is executedonthespecifiedplatform in anenvironmentwith thefollowing characteristics:

s Thefile systemis encapsulatedsothatabsolutepaths(i.e., thosebeginningwith a Delim) areinterpretedrelative to ./root , where‘.’ is the implicitfinal parameterto run tool . Non-absolutepathsareinterpretedrelativeto./root/$ wd, wherewd is a parameterto run tool . Theinterpretationof filenamesis discussedin moredetailbelow.

s The environmentvariablesare taken from ./envVars , where‘.’ is theimplicit final parameterto run tool .

s Thecontentof standardinput is thevalueof thestdin run tool parameter.

s Treatmentof standardoutput and standarderror is specifiedby the std-out treatmentandstderr treatmentparameters.Theseparametersmay beoneof the t text values"ignore" , "report" , "report nocache" ,"value" , or "report value" . If the treatmentis "ignore" , any

205

Page 218: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

byteswritten to the correspondingoutputstream(stdoutor stderr)aredis-carded.If thetreatmentis "report" , thecorrespondingoutputismadevis-ible to theuser. If thetreatmentis "report nocache" , thecorrespondingoutputis madevisible to theuserand,if it is not empty, theevaluatordoesnot cachethe resultof the run tool call. If the treatmentis "value" ,theoutputstreamis convertedto aVestavalueof typet text andreturnedaspartof the run tool result,asdescribedbelow. If thetreatmentis "re-port value" , theoutputstreamis bothmadevisible to theuserandalsoreturnedaspartof theresult.

s Thestatustreatmentandsignal treatmentargumentsmaytake on thet textvalue"report" or "report nocache" . Regardlessof theirvalues,thecode andsignal fieldsof theresultvaluewill besetasdescribedbelow. Ifthevalueof statustreatmentis "report nocache" , this run tool callwill notbecachedif theresultcode isnonzero;similarly, if signal treatmentis "report nocache" , the run tool call will not becachedif there-sultsignal is nonzero.Additionally, in our implementation,a runtoolcallthatis notcachedbecauseof its returncodeor signalis consideredaruntimeerror andhalts the evaluationwith an error message,unlessthe -k (“keepgoing”) flag is givenon theevaluatorcommandline.

s The fp contentargumentcontrolshow fingerprintsareassignedto any de-rived files createdby the tool execution,includingderived files createdforstdoutor stderrwhenthe valueof the stdouttreatmentor stderr treatmentparameteris “value”. A valueof -1 causesthe fingerprintsof all suchde-rived files to be computeddeterministicallyfrom their contents. A non-negative fp contentvalueof x causesonly thosefiles lessthan x bytesinlength to have their fingerprintscomputedfrom the file contents;an arbi-trary uniquefingerprintis chosenfor files at leastx bytesin length.Hence,a value of 0 causesall derived files to be assignedarbitrary fingerprints.Setting fp contentto -2 selectsa site-dependentdefault value (set by the[Evaluator]/FpContentconfigurationvariable,in our implementation).ThebooleanvaluesTRUEandFALSE areacceptedassynonyms for -1 and0,respectively.

The cost of fingerprintinga file’s contentsis non-trivial (approximately1secondpermegabyteon theprototypeimplementation),but doingsoallowsfor cachehits in caseswheretwo evaluationsdependonavaluethatis iden-tical, but wascomputedin two differentways.

s The existing writable argumentcontrols whetherthe tool is permittedtowrite to files that alreadyexist in its encapsulatedfile systemwhen it is

206

Page 219: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

started.If theargumentis TRUE, suchfiles maybeopenedfor writing andwritten to; if it is FALSE, they maynot. For technicalreasonsin theNFS-basedrepositoryimplementation,toolswill getmuchbetterfile systemper-formancewhenexisting writable is FALSE. It shouldbe setto TRUEonlyfor toolsthatrequireit.

The run tool primitive returnsa binding that containsthe resultsof thecommandexecution.Thisbindinghastype:

type run_tool_result = binding [code : int,signal : int,stdout_written : bool,stderr_written : bool,stdout : text,stderr : text,root : binding

]

If r is of typerun tool result,then:

s r/code is anintegervaluethatcharacterizeshow thecommandterminated(i.e., theexit statusof theUnix process).

s r/signal is an integer value identifying the Unix signal that terminatedtheprocess,or 0 if theprocessexited voluntarily.

s r/stdout written and r/stderr written indicate whetherdatawaswritten to thestdoutandstderrstreams,respectively.

s r/stdout is definedif andonly if the stdouttreatment run tool pa-rameteris "value" or "report value" , in which caseit containsthebyteswritten to stdout.

s r/stderr is definedif andonly if the stderr treatment run tool pa-rameteris "value" or "report value" , in which caseit containsthebyteswritten to stderr.

s r/root is a binding containingall files createdby the commandthat areextant upon exit. Seethe descriptionunder“File SystemEncapsulation”below for details.

Two finepointsrelatingto theresultsof run tool :

207

Page 220: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

1. If the tool cannotbe invoked—for example,becauseof errors in the pa-rametersto run tool —theevaluatoralwaysprintsa diagnosticandhaltswith a runtime error. However, errorsthat occur during the executionofthe tool arereportedin a tool-specificfashion,asdiscussedunder"sta-tus treatment" and"signal treatment" above.

2. If "report nocache" is specifiedasthe treatmentfor anoutputstream(stdoutor stderr)or the exit or signalstatus,the evaluatorwill not make acacheentry for the run tool call if any outputis producedon thecorre-spondingoutputstreamor if theexit or signalstatusis nonzero,respectively.In addition,noneof theancestorfunctionsof thefailing run tool call inthecall grapharecachedeither. Sinceno cacheentriesaremade,a subse-quentre-interpretationof themodelwill producethesameoutput(onstdoutor stderr).Thiscanbeusefulfor reproducingerrormessagesfromacompileror otherexternaltool thataredisplayedthroughtheVestauserinterface.

File SystemEncapsulation:

s Whenthecommandprocess(or any subprocessit creates)executesa Unixsystemcall that includesa file pathasa parameter, thefile pathis translatedinto a referenceinto the‘.’ bindingthatis thelastparameterto run tool .

s Thepathis interpretedrelative to ./root if it is absolute(i.e., if it beginswith “/”), andrelative to ./root/$wd otherwise,where$wd is thevalueof the wd parameterto run tool . Eachcomponentof the path—exceptpossiblythefinal one—mustnameaVestabinding.Theinterpretationof thefinal componentof thepathdependson thesemanticsof thesystemcall. Ifthesystemcall expectsanextantfile, thefinal componentmustnameaVestavalueof typet text. If thesystemcall expectsanextantdirectory, theVestavaluemustbeof typet binding. If thesystemcall expectsanunboundname,thenamemustnotbeboundby thebindingcorrespondingto thepenultimatepathcomponent.

s A file createdor modified by the commandprocess(or a subprocess)re-mainsvisible in the namespacethroughoutthe remainderof the process’sexecution(or until deleted),just asin a regularfile system.This is achievedby modelingfile creation,modification,anddeletionasa suitableoverlay-ing of ./root . For example,if theprocesscreates“foo.o” in its workingdirectory, thishastheeffectof:

./root/$wd += [ foo.o = <bytes of file> ];<subsequent execution of the command process>

208

Page 221: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

s File modificationis handledin exactly the sameway. For example,if theprocessopenstheexisting file “foo.db” in its working directoryandwritesto it, this hastheeffect of:

./root/$wd += [ foo.db = <new contents of file> ];<subsequent execution of the command process>

Notethatmodificationof preexistingfilesis forbiddenif theexisting writableargumentto run tool is setto FALSE (its default value).

s File deletionsaremodeledsimilarly, but thefilesareremovedfrom thecon-text using the binding difference(-) operator, insteadof addedusing thebindingoverlay(+) operator.

s Whenthe commandprocessexits, the accumulatedeffectsof the file cre-ationsanddeletionsit hasperformedarereturnedaspartof the run toolresult(in r/root ). In thisbinding,thenamesof filesdeletedby thetool areboundto false. Suchnamescorrespondeitherto filesthatexistedin ./rootbeforethetool wasinvoked,or to files createdandsubsequentlydeletedbythetool.

Thus,if ./root representsthestateof thefile systemvisible to thecom-mandprocessatthetimeit is launched,thenthestateof thefile systemwhenit exits canbedescribedas:

./root ++ r/root

So, if the invoker of run tool wantedto update./root to reflect thechangesmadeby calling run tool , thecodemight look like this:

r = _run_tool( <suitable parameters> );new_fs = ./root ++ r/root;. += [ root = new_fs ];

After the last assignment,namesin ./root boundto false arefiles thatwere deletedby the tool. Here is a recursive function for removing suchfiles:

remove_deleted(b: binding): binding{

res: binding = [];foreach [ n = v ] in b do

res += if v = false then [] elseif _is_binding(v)

209

Page 222: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

then [ $n = remove_deleted(v) ]else [ $n = v ];

return res;};

A.3.4.9 DiagnosticFunctions

t_value_print(t_value v, t_int deps = 0, t_bool verbose = FALSE)

Print thevaluev to standardoutputfollowedby anewline, andreturnv. Whatgetsprinteddependson v’s type. If v is of type t err, ERRis printed. If v is of typet bool,TRUEor FALSE is printed.If v is of typet int, its decimalvalueis printed.

Theprintedrepresentationof a t text valueis <file 0x XXXXXXXX> if ver-boseis falseandthetext is representedby abackingfile, in whichcaseXXXXXXXXis the file’s hexadecimalidentifier. Otherwise,it is the text value’s contentsen-closedin doublequotes.

Theprintedrepresentationof a t list valuecontainingthevalues{ 1 wH{ 2 wH}�}�}DwH{ kis < p1 w p2 wH}�}�}�w pk >, wherepi denotestheprintedrepresentationof thevalue { i .

The printedrepresentationof a t binding valuecontainingthe (name,value)pairs z n1 wH{ 1| wHz n2 wH{ 2 | wH}�}�}DwHz nk wH{ k | is [ n1 ~ p1 wH}�}�}Dw nk ~ pk ] , whereagainpi

denotestheprintedrepresentationof thevalue { i .Theprintedrepresentationof at closurevalueis <Model name> if theclosure

is representedby a model,in which casenameis a namefor themodelfile in therepository. Otherwise,if verboseis true, it is the completelist of formals,body,andcontext; if not it is simply <Closure> .

If depsis greaterthanzero,thevalue’s dependenciesarealsoprinted. In thecurrentimplementation,valuesof 1 and2 provide different levels of detail. Thisfeatureis meantfor debuggingtheevaluatoritself.

Typically, print is usedfor debuggingpurposes,andits result is ignored.However, it is importantto rememberthat print is a function,not a statement.Hence,onecannotsimplywrite:

_print(v);

insidea functionbody. Instead,the call to print mustbe usedin a functionalway, suchas:

dummy = _print(v);

Notealsothatefficient implementationsof theVestalanguagewill cachefunctionresultsandre-usethosecachedresultswhenever it is safeto do so. Calls to the

210

Page 223: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

print function itself arenever cached.However, the print function’s sideeffect of printing to the terminal is not repeatedwhenever the call to print isskippeddueto ahigherlevel hit on thefunctioncache.

t_bool_assert(t_bool cond, t_value msg)

If the valuecond is true, return true. Otherwise,print the valuemsgaswith theprint primitive, thenterminatetheevaluationwith a runtimeerror. As a diag-

nosticaid,our implementationallows thecontext of a falseassertionand/orastacktraceto be printedaswell, if selectedby command-lineoptionsto the evaluator.Notethat,like print , assert is a function,notastatement.

A.4 ConcreteSyntax

A.4.1 Grammar

Models:

Model ::= Files Imports Block

FilesClauses:

Files ::= FileClause*FileClause ::= files FileItem*;FileItem ::= FileSpec | FileBindingFileSpec ::= [ Arc = ] DelimPathFileBinding ::= Arc = ‘[’ FileSpec*, ‘]’

Import Clauses:

Imports ::= ImpClause*ImpClause ::= ImpIdReq | ImpIdOptImpIdReq ::= import ImpItemR*;ImpItemR ::= ImpSpecR | ImpListRImpSpecR ::= Arc = DelimPathImpListR ::= Arc = ‘[’ ImpSpecR*, ‘]’ImpIdOpt ::= from DelimPath import ImpItemO*;ImpItemO ::= ImpSpecO | ImpListOImpSpecO ::= [ Arc = ] Path [ Delim ]ImpListO ::= Arc = ‘[’ ImpSpecO*, ‘]’

211

Page 224: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Pathsand Ar cs:

DelimPath ::= [ Delim ] Path [ Delim ]Path ::= Arc { Delim Arc }*Arc ::= Id | Integer | Text

Blocks and Statements:

Block ::= ‘{’ Stmt*; Result; ‘}’Stmt ::= Assign | Iterate | FuncDef | TypeDefResult ::= { value | return } Expr

AssignmentStatements:

Assign ::= TypedId [ Op ] = ExprOp ::= AddOp | MulOpAddOp ::= + | ++ | -MulOp ::= *

Iteration Statements:

Iterate ::= foreach Control in Expr do IterBodyControl ::= TypedId | ‘[’ TypedId = TypedId ‘]’IterBody ::= Stmt | ‘{’ Stmt+; ‘}’

Function Definitions:

FuncDef ::= Id Formals+ [ TypeQual ] BlockFormals ::= ( FormalArgs )FormalArgs ::= TypedId*,

| { TypedId = Expr }*,| TypedId { , TypedId }* { , TypedId = Expr }+

Expressions:

Expr ::= if Expr then Expr else Expr | Expr1Expr1 ::= Expr2 { => Expr2 }*Expr2 ::= Expr3 { || Expr3 }*Expr3 ::= Expr4 { && Expr4 }*Expr4 ::= Expr5 [ { == | != | < | > | <= | >= } Expr5 ]Expr5 ::= Expr6 { AddOp Expr6 }*Expr6 ::= Expr7 { MulOp Expr7 }*Expr7 ::= [ UnaryOp ] Expr8UnaryOp ::= - | !Expr8 ::= Primary [ TypeQual ]

212

Page 225: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Primary ::= ( Expr ) | Literal | Id | List| Binding | Select | Block | FuncCall

Binaryoperatorswith equalprecedenceareleft-associative.

Literals:

Literal ::= ERR | TRUE | FALSE | Text | Integer

Lists:

List ::= < Expr*, >

Bindings:

Binding ::= ‘[’ BindElem*, ‘]’BindElem ::= SelfNameB | NameBindSelfNameB ::= IdNameBind ::= GenPath = ExprGenPath ::= GenArc { Delim GenArc }* [ Delim ]GenArc ::= Arc | $ Id | $ ( Expr ) | % Expr %

Binding Selections:

Select ::= Primary Selector GenArcSelector ::= Delim | !

Function Calls:

FuncCall ::= Primary ActualsActuals ::= ( Expr*, )

TypeDefinitions:

TypeDef ::= type Id = TypeTypedId ::= Id [ TypeQual ]TypeQual ::= : TypeType ::= any | bool | int | text

| list [ ( Type ) ]| binding ( TypeQual )| binding [ ( TypedId*, ) ]| function { ( TypedForm*, ) }* [ TypeQual ]| Id

TypedForm ::= [ Id : ] Type

213

Page 226: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

A.4.2 Ambiguity Resolution

Thegrammarasgivenabove is ambiguous.Weresolve theambiguityasfollows.TheVestaparseracceptsa modifiedgrammarin which the> tokenis replaced

by two distinct tokens: GREATER in theproductionfor Expr4andRANGLE intheproductionfor List. Themodifiedgrammaris unambiguousandcaneasilybeparsedby anLL(1) or LALR(1) automaton.

TheVestatokenizeris responsiblefor disambiguatingbetweenGREATERandRANGLE wherever > appearsin theinput. It doessoby lookingaheadto thenexttokenafterthe>. If thenext tokenis oneof

- ! ( ERR TRUE FALSE Text Integer Id < [ {

thenthe> is takenasGREATER;otherwise,it is takenasRANGLE.Why is this solutionreasonable?Inspectionof the grammarshows that in a

syntacticallyvalid program,thenext tokenafterGREATER mustbeoneof thosein thelist above. Thenext tokenafterRANGLE mustbeoneof thefollowing:

: * + ++ - == != < GREATER<= >= && || =>; do , ) then else RANGLE] % / \ ! (

Thesesetsoverlapin the tokens- , ! , ( , and<. Becausewe have chosentoresolve thesecasesas GREATER, it is impossibleto write certainsyntacticallyvalid programscontainingRANGLE. However, any suchprogramcanberewrittenby replacingevery List nonterminalby ( List ) , yielding a semanticallyequiva-lent programin which theclosing> of theList is correctlyresolvedasRANGLE.Moreover, weclaim(withoutpresentingaproof) thatany programin whichRAN-GLE is followedby - , ! , ( , or < musthavearuntimetypeerror, dueto thepaucityof operatorsdefinedonthelist type,soin practicesuchprogramsareneverwritten.

A.4.3 Tokens

Table A.10 gives a BNF descriptionof the tokensof the language. The tokenclassesDelim, Integer, Id, andText, andtheindividual tokensin theclassesPunc,TwoPunc,andKeyword,serve asterminalsin theBNF of earliersections.

We defineNewline asan ASCII new line sequence,eitherCR, LF, or CRLF.NonNewlineCharis any ASCII characterotherthanCR andLF. CommentBodyisany sequenceof ASCII charactersthatdoesnotcontain‘*/’. Tabis theASCII TABcharacter.

Theambiguitiesin the token grammarareresolved asfollows. The tokenizerinterpretstheprogramasaTokenSeq.It scansfrom left to right, repeatedlymatch-ing the longestpossibleTokenbeginningwith thenext unmatchedcharacter. The

214

Page 227: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

TokenSeq ::= Token*Token ::= Integer | Id | Text | Punc

| TwoPunc | Keyword | Whitespace| Comment

Delim ::= / | \

Integer ::= DecimalNZ Decimal*| 0 Octal* | 0 { x | X } Hex+

Decimal ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9DecimalNZ ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9Octal ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7Hex ::= Decimal

| A | B | C | D | E | F| a | b | c | d | e | f

Id ::= { Letter | Decimal | IdPunc }+Letter ::= A | B | C | D | E | F | G | H | I | J | K

| L | M | N | O | P | Q | R | S | T | U | V| W | X | Y | Z| a | b | c | d | e | f | g | h | i | j | k| l | m | n | o | p | q | r | s | t | u | v| w | x | y | z

IdPunc ::= . | _

Text ::= " TextChar* "TextChar ::= Decimal | Letter | Punc | EscapePunc ::= ˜ | ‘ | ! | @ | # | $ | % | ˆ | & | * | ( | )

| _ | - | + | = | ‘{’ | ‘[’ | ‘}’ | ‘]’ | : | ;| ‘|’ | ’ | , | < | . | > | ? | / | Space

Escape ::= \ EscapeCharEscapeChar ::= n | t | v | b | r | f | a | \ | "

| Octals | HexesOctals ::= Octal [ Octal [ Octal ] ]Hexes ::= { x | X } Hex [ Hex ]

TwoPunc ::= ++ | == | != | <= | >= | => | || | &&

Keyword ::= binding | do | else | ERR | FALSE | files| foreach | from | function | if | in | import| list | return | then | type | TRUE | value

Whitespace ::= ‘ ’ | Tab | Newline

Comment ::= // NonNewlineChar* Newline| ‘/*’ CommentBody ‘*/’

TableA.10: BNF for thetokensof theVestaSDL.

215

Page 228: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

tokensWhitespaceandCommentarediscardedafter matching;other tokensarepassedon for parsingby themaingrammar. Whena stringof charactersmatchesbothIntegerandId, it is tokenizedasInteger. WhenastringmatchesbothKeywordandId, it is tokenizedasKeyword.

A.4.4 Reserved Identifiers

HerearetheVestaSDL’sreservedidentifiers;they shouldnotberedefined:

_append _assert _bind1 _defined _div _elem _find _findr_fingerprint _head _is_binding _is_bool _is_closure_is_err _is_int _is_list _is_text _length _list1 _lookup_map _max _min _mod _model_name _n _par_map _print_run_tool _same_type _self _sub _tail _type_of _v

216

Page 229: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Acknowledgements

TheVestaprojecthasextendedover many years,andtherehave beenmany con-tributorsandsupporters.

We begin by thankingthosewho participatedin the developmentof Vesta-1,includingBobAyers,Mark R. Brown, Sheng-YangChiu,JohnEllis, ChrisHanna,andPaulMcJones.

ButlerLampsonwasoneof theprimarydesignersof theVesta-2functioncacheandweeder. He alsocontributedto the initial designof thesystemmodelinglan-guageandrepository. Butler, Jim Horning, andMartın Abadi helpeddesigntheevaluator’s fine-graineddependency algorithm. Togetherwith Chris Hanna,Jimalsoparticipatedin thedesignof thesystemmodelinglanguageandtheinitial im-plementationof theevaluator.

Bill McKeeman’ssuggestionshelpedusto makethemodelinglanguagesyntaxsimplerandmorereadable.Our fingerprintpackageis basedon codeand ideasfrom Andrei Broder. Jeff Mogul andMike Burrows helpedtrack down a seriousperformanceproblemin our RPCimplementation.ChanduThekkathhelpedwithNFS performanceproblemsandgave helpful commentson a draft of this report.GunSirerimplementedtheModula-3bridgeandmadeseveralimprovementsto theperformanceof theentiresystem.Mark Lillibridge gaveusmany usefulcommentson an earlier draft of Appendix A, while Cynthia Hibbard proofreada draft oftheentiremanuscriptandprovidednumerouscorrections.Neil Stratfordcodedanearlyversionof thereplicationtoolsandsomeof therepositorysupportfor them.

Tim Leonardinitiatedour contactwith theArana(Alpha microprocessor)de-velopmentgroup,which becameVesta’s first large usercommunity, andWalkerAndersonandJofordLim led thegroup’s initial evaluationof Vesta.Matt ReillyandKenSchalkweretheprimemoversin carryingtheAranaVestaeffort throughto adoptionandproductionuse. Now that we have moved on to otheractivities,Kenhasbecometheprimarymaintainerof theentireVestasystem.KenandMatthave alsodonethebulk of thework on theLinux port.

217

Page 230: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

Bibliography

[1] Martın Abadi, Butler Lampson,andJean-JacquesLevy. Analysis andcachingofdependencies.In Proceedingsof the1996ACM SIGPLANInternationalConferenceon FunctionalProgramming(ICFP ’96), pages83–91,May 1996.

[2] AccuRev, Inc. Accurev. http://www.accurev.com/i prod.html.

[3] Larry Allen, Gary Fernandez,KennethKane, David Leblang,DebraMinard, andJohnPosner. ClearCaseMultiSite: Supportinggeographically-distributedsoftwaredevelopment. In Software Configuration Management: ICSE SCM-4and SCM-5WorkshopsSelectedPapers, pages194–214,1995. Available as volume 1005 inLectureNotesin ComputerScience, Springer-Verlag.

[4] Atria Software,Inc., 24 PrimePark Way, Natick, MA 01760. ClearCaseConceptsManual, 1992.

[5] DaveBelanger, David Korn,andHermanRao. Infrastructurefor wide-areasoftwaredevelopment. In Proceedingsof the 6th InternationalWorkshopon Software Con-figurationManagement, pages154–165,1996.Availableasvolume1167in LectureNotesin ComputerScience, Springer-Verlag.

[6] A. D. Birrell, R. Levin, R. M. Needham,andM. D. Schroeder. Grapevine: An ex-ercisein distributedcomputing.Communicationsof theACM, 25(4):260–274,April1982.

[7] Andrew D. Birrell, MichaelB. Jones,andEdwardP. Wobber. A simpleandefficientimplementationfor small databases.In Proceedingsof the 11th ACM Symposiumon Operating SystemsPrinciples, pages149–154.The Associationfor ComputingMachinery, November1987.

[8] BitMover, Inc. Bitkeeper. http://www.bitkeeper.com/.

[9] RobertS.BoyerandJStrotherMoore.A ComputationalLogic Handbook. AcademicPress,1988.

[10] AndreiBroder. Someapplicationsof Rabin’sfingerprintingmethod.In R. Capocelli,A. De Santis,andU. Vaccaro,editors,SequencesII: Methodsin Communications,Security, andComputerScience, pages143–152.Springer-Verlag,1993.

218

Page 231: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

[11] Mark R. Brown and JohnR. Ellis. Bridges: Tools to extend the Vestaconfigu-ration managementsystem. SRC ResearchReport 108, Digital EquipmentCor-poration,SystemsResearchCenter, June1993. http://gatekeeper.research.compaq.com/pub/compaq/SRC/research-reports/abstracts/src-rr-108.html.

[12] ToddBrunhoff. makedepend(1)manualpage.

[13] Sheng-YangChiu andRoy Levin. TheVestarepository:A file systemextensionforsoftwaredevelopment.SRCResearchReport106,Digital EquipmentCorporation,SystemsResearchCenter, June1993. http://gatekeeper.research.compaq.com/pub/compaq/SRC/research-reports/abstracts/src-rr-106.html.

[14] ComputerAssociates.CCC/Harvest.http://www.cai.com/products/ccm/.

[15] Connectathontestsuites.http://www.connectathon.org/nfstests.html.

[16] Jacky EstublierandRubbyCasallas. The adeleconfigurationmanager. In WalterTichy, editor, Configuration Management, volume 2 of Trendsin Software, pages99–134.JohnWiley and SonsLtd., 1994. Available at http://www-adele.imag.fr/Les.Publications/BD/adele1994est.html.

[17] S. I. Feldman.Make—A programfor maintainingcomputerprograms.Software—PracticeandExperience, 9(4):255–265,April 1979.

[18] GlennFowler. A casefor make. Software—PracticeandExperience, 20(S1):S1/35–S1/46,June1990.

[19] FreeSoftwareFoundation.GNU lessergeneralpublic licence. http://www.fsf.org/licenses/lgpl.html.

[20] CaryG. GrayandDavid R. Cheriton.Leases:An efficient fault-tolerantmechanismfor distributedfile cacheconsistency. In Proceedingsof the 12th ACM Symposiumon Operating SystemsPrinciples, pages202–210.The Associationfor ComputingMachinery, December1989.

[21] Jim Gray andAndreasReuter. TransactionProcessing:Conceptsand Techniques.MorganKaufmannPublishers,SanMateo,CA, 1993.

[22] Dick Grune,BrianBerliner, andJeff Polk. cvs(1)manualpage. FreeSoftwareFoun-dation.

[23] Carl A. Gunter. Abstractingdependenciesbetweensoftware configurationitems.ACM TransactionsonSoftwareEngineeringandMethodology, 9(1):94–131,January2000.

[24] ChristineB. Hannaand Roy Levin. The Vestalanguagefor configurationman-agement.SRCResearchReport107,Digital EquipmentCorporation,SystemsRe-searchCenter, June1993.http://gatekeeper.research.compaq.com/pub/compaq/SRC/research-reports/abstracts/src-rr-107.html.

219

Page 232: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

[25] Allan Heydon,Roy Levin, TimothyMann,andYuanYu. TheVestaapproachto soft-wareconfigurationmanagement.SRCResearchReport168,CompaqComputerCor-poration,SystemsResearchCenter, March2001. http://gatekeeper.research.compaq.com/pub/compaq/SRC/research–reports/abstracts/src–rr–168.html.

[26] Allan Heydon,Roy Levin, andYuanYu. Cachingfunctioncallsusingprecisedepen-dencies. In Proceedingsof the ACM SIGPLAN2000Conferenceon ProgrammingLanguageDesignandImplementation, pages311–320,June2000.

[27] Allan HeydonandGreg Nelson. TheJuno-2constraint-baseddrawing editor. SRCResearchReport131a,Digital EquipmentCorporation,SystemsResearchCenter,December1994.http://gatekeeper.research.compaq.com/pub/compaq/SRC/research-reports/abstracts/src-rr-131a.html.

[28] JohnH. Howard,MichaelL. Kazar, SherriG. Menees,David A. Nichols,M. Satya-naraynan,RobertN. Sidebotham,andMichaelJ. West. Scaleandperformancein adistributedfile system.ACM TransactionsonComputerSystems, 6(1):51–81,Febru-ary 1988.

[29] Juno-2homepage.http://www.research.compaq.com/SRC/juno-2/.

[30] David B. LeblangandRobertP. Chase,Jr. Computer-aidedsoftwareengineeringin adistributedworkstationenvironment.SIGPLANNotices, 19(5):104–112,May 1984.

[31] David B. Leblang,RobertP. Chase,Jr., andGordonD. McLean,Jr. TheDOMAINsoftwareengineeringenvironmentfor large-scalesoftwaredevelopmentefforts. InProceedingsof the 1st InternationalConferenceon ComputerWorkstations, pages266–280,SanJose,CA, November1985.IEEE ComputerSociety, IEEE ComputerSocietyPress.ISBN 0-8186-0649-5,IEEE CatalogNumber85CH2228-5.

[32] Roy Levin andPaul R. McJones. The Vestaapproachto preciseconfigurationoflargesoftwaresystems.SRCResearchReport105,Digital EquipmentCorporation,SystemsResearchCenter, June1993. http://gatekeeper.research.compaq.com/pub/compaq/SRC/research-reports/abstracts/src-rr-105.html.

[33] J.MacDonald,P. N. Hilfinger, andL. Semenzato.PRCS:Theprojectrevisioncontrolsystem.In Proceedingsof the8th InternationalWorkshopon SystemConfigurationManagement, pages33–45,1998. Available as volume 1439 in Lecture NotesinComputerScience, Springer-Verlag,andathttp://citeseer.nj.nec.com/279027.html.

[34] Boris MagnussonandUlf Asklund. Fine grainedversioncontrol of configurationsin COOP/orm. In Proceedingsof the 6th InternationalWorkshopon SystemCon-figuration Management, pages31–48,1996. Availableat http://citeseer.nj.nec.com/magnusson96fine.html.

[35] Timothy Mann. Partial replication in the Vesta software repository. SRC Re-searchReport 172, Compaq ComputerCorporation, SystemsResearchCenter,August 2001. http://gatekeeper.research.compaq.com/pub/compaq/SRC/research–reports/abstracts/src–rr–172.html.

220

Page 233: Allan Heydon Roy Levin Timothy Mann - HP · PDF fileJanuary 22, 2002 SRC Research Report 177 The Vesta Software Configuration Management System Allan Heydon Roy Levin Timothy Mann

[36] Timothy Mann, Andrew Birrell, Andy Hisgen,CharlesJerian,and GarretSwart.A coherentdistributed file cachewith directory write-behind. SRC ResearchReport 103, Digital Equipment Corporation, SystemsResearchCenter, June1993. http://gatekeeper.research.compaq.com/pub/compaq/SRC/research-reports/abstracts/src-rr-103.html.

[37] PeterMiller. Aegis. http://www.canb.auug.org.au/millerp/aegis/aegis.html.

[38] Modula-3 home page. http://www.research.compaq.com/SRC/modula-3/html/home.html.

[39] Greg Nelson,editor. SystemsProgrammingwith Modula-3. Seriesin InnovativeTechnology. PrenticeHall, EnglewoodCliffs, New Jersey 07632,1991.

[40] JohnOusterhout.Why aren’t operatingsystemsgettingfasterasfastashardware?In Proceedingsof theSummerUSENIXConference, pages247–256,Anaheim,CA,Summer1990.

[41] PerforceSoftware.Perforce.http://www.perforce.com/perforce/products.html.

[42] M. O.Rabin.Fingerprintingby randompolynomials.ReportTR–15–81,Departmentof ComputerScience,HarvardUniversity, 1981.

[43] MarcJ.Rochkind.Thesourcecodecontrolsystem.IEEE Transactionson SoftwareEngineering, SE–1(4):364–370,December1975.

[44] RusselSandberg, David Goldberg, Steve Kleiman,DanWalsh,andBob Lyon. De-signandimplementionof theSunnetwork filesystem.In Proceedingsof theSummerUSENIXConference, pages119–130,June1985.

[45] KennethC. Schalk.VestaSDL Programmer’sReference.http://doc/sdl-ref/.

[46] Telelogic AB. Telelogic CM Synergy (formerly Continuus/CM). http://www.telelogic.com/products/synergy/.

[47] W. Tichy. Design,implementation,andevaluationof a revision control system. InProceedingsof the6thInternationalConferenceonSoftwareEngineering, pages58–67.IEEE ComputerSocietyPress,1982.

[48] W. Tichy. RCS—Asystemfor versioncontrol. Software—PracticeandExperience,15(7):637–654,July1985.

[49] TRUE Software. TRUEchange(formerly known as Aide-De-Campor ADC).http://www.truesoft.com/.

[50] Andre van der Hoek, Antonio Carzaniga,Dennis Heimbigner, and AlexanderL.Wolf. A testbedfor configurationmanagementpolicy programming.IEEE Transac-tionsonSoftwareEngineering. To appear. Availablefrom http://www.ics.uci.edu/an-dre/.

[51] VestaHomePage.http://www.vestasys.org.

221