Partitioning in Avionics Architectures: Requirements ... · PDF filePartitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance John Rushby Computer Science Laboratory

Partitioning in Avionics Architectures:Requirements, Mechanisms, and Assurance

�

JohnRushbyComputerScienceLaboratory

SRI InternationalMenloParkCA 94025USA

TechnicalReportMarch1999

�This work was sponsoredby the FAA TechnicalCenterand NASA Langley ResearchCen-

ter throughcontractNAS1-20334,andby DARPA andNSA throughAir ForceRomeLaboratoryContractF30602-96-C-0204.

ii

Abstract

Automatedaircraft control hastraditionally beendivided into distinct functionsthatareimplementedseparately(e.g.,autopilot,autothrottle,flight management);eachfunctionhasits own fault-tolerantcomputersystem,anddependenciesamongdifferent functionsare generallylimited to the exchangeof sensorand control data. A by-productof this“federated”architectureis thatfaultsarestronglycontainedwithin thecomputersystemofthefunctionwherethey occurandcannotreadilypropagateto affect theoperationof otherfunctions.

More modernavionics architecturescontemplatesupportingmultiple functionson asingle,shared,fault-tolerantcomputersystemwherenaturalfault containmentboundariesarelesssharplydefined.Partitioning usesappropriatehardwareandsoftwaremechanismsto restorestrongfault containmentto suchintegratedarchitectures.

This reportexaminesthe requirementsfor partitioning,mechanismsfor their realiza-tion, andissuesin providing assurancefor partitioning. Becausepartitioningsharessomeconcernswith computersecurity, securitymodelsare reviewed and comparedwith theconcernsof partitioning.

iii

Acknowledgments

I wish to acknowledgemy deepappreciationfor thesupportof PeteSaraceniof theFlightSafetyResearchBranchof theWilliam J.HughesFAA TechnicalCenterandRicky Butlerof theNASA Langley ResearchCenter. As oftenbefore,they providedencouragementandexcellenttechnicaladvicethroughoutpreparationof thisreportanddisplayedstoicpatience.

I amalsovery gratefulto otherresearchersandto engineersin severalaerospacecom-panieswho took the time to explain their concernsandapproachesto me. I particularlybenefitedfrom extensive discussionswith David Hardin,DaveGreve,andMatt Wilding ofCollins; with Kevin Driscoll andAaronLarsenof Honeywell; andwith HermannKopetzof theViennaTechnicalUniversity. Bendi Vito of NASA Langley providedexcellentcom-mentsandposedtaxingquestionsonadraft versionof this report.

iv

Contents

1 Motivation and Introduction 1

2 Informal Requirements 32.1 IntegratedModularAvionics . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Issues and Mechanisms 133.1 PartitioningWithin aSingleProcessor. . . . . . . . . . . . . . . . . . . . 13

3.1.1 SpatialPartitioning . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.2 TemporalPartitioning . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 PartitioningAcrossaDistributedSystem. . . . . . . . . . . . . . . . . . . 293.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4 Comparison With Computer Security 374.1 DataandInformationFlow . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.1.1 AccessControl . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.1.2 Noninterference. . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.1.3 Separability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2 Integrity Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3 Timing ChannelsandDenialof Service . . . . . . . . . . . . . . . . . . . 484.4 Applicationto Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5 Conclusion 51

References 53

v

List of Figures

2.1 Alternative OperatingSystem/PartitioningDesigns . . . . . . . . . . . . . 7

3.1 DifferentOperatingSystemSoftwarefor DifferentPartitions . . . . . . . . 28

4.1 AllowedInformationFlows for anEncryptionController . . . . . . . . . . 43

vi

Chapter 1

Motivation and Introduction

Digital flight-controlfunctionsin currentaircraftaregenerallyimplementedby a federatedarchitecturein which eachfunction(e.g.,autopilot,flight management,yaw damping,dis-plays)hasits own computersystemthatis only looselycoupledto thecomputersystemsofotherfunctions.A greatadvantageof thisarchitectureis thatfault containmentis inherent:thatis to say, a fault in thecomputersystemsupportingonefunction,or in thesoftwareim-plementingthat function, is unlikely to propagateto otherfunctionsbecausethereis verylittle that is sharedacrossthedifferentfunctions.To besure,somefunctionsinteractwithothers,but theseinteractionsareaccomplishedby theexchangeof data,andfunctionscanbedesignedto detectandtoleratea faultyor erraticdatasource.

The obvious disadvantageto the federatedapproachis its profligateuseof resources:eachfunctionneedsits own computersystem(which is generallyreplicatedfor fault toler-ance),with all theattendantcostsof acquisition,space,power, weight,cooling,installation,andmaintenance.IntegratedModularAvionics (IMA) hasthereforeemergedasa designconceptto challengethe federatedarchitecture[1,78]. In IMA, a singlecomputersystem(with internalreplicationto providefaulttolerance)providesacommoncomputingresourceto severalfunctions.As asharedresource,IMA hasthepotentialto diminishfault contain-mentbetweenfunctions:for example,a faulty functionmight monopolizethecomputerorcommunicationssystem,denying serviceto all theotherfunctionssharingthatsystem,orit might corrupt the memoryof other functionsor sendinappropriatecommandsto theiractuators.It is almostimpossiblefor individual functionsto protectthemselvesagainstthiskind of corruptionto the computationalresourceon which they depend,so any realiza-tion of IMA mustprovide partitioning to ensurethatthesharedcomputersystemprovidesprotectionagainstfault propagationfrom onefunctionto anotherthat is equivalentto thatwhich is inherentto thefederatedarchitecture.

Thepurposeof this reportis to identify therequirementsfor partitioningin IMA andtoexploretopicsin achieving thoserequirementswith veryhighassurance.Thenext chapter,therefore,is concernedwith the generalrequirementsfor IMA andpartitioning,and theonefollowing with issuesin theimplementationof IMA andthemechanismsfor partition-

1

2 Chapter1. MotivationandIntroduction

ing. Thediscussionin thesechaptersis deliberatelymoregeneralthanthat in ARINC 651(“DesignGuidancefor IntegratedModularAvionics”) [1]: the ARINC documentreflectsaircraft practice;whereas,we take a computerscienceperspective—in the hopethat thiswill casta new or different light on the issues. For this reason,our terminologyis alsomoregeneric(e.g.,we speakof processorsandotherbasiccomponentsratherthanline re-placeablemodules(LRMs)) andsoarethecomponentpropertiesthatweconsider(e.g.,weconsiderbusesin general,not justavionic busessuchasARINC 629[3]). In Chapter4, weconsidermethodsdevelopedfor specifyingandanalyzingcomputersecuritypolicies,sincethesesharesomeconcernswith partitioningandhavebeentheobjectof considerablestudy.Weendwith conclusionsandsuggestionsfor futurework.

Chapter 2

Informal Requirements

To gaininsightinto therequirementsfor partitioning, wefirst needto examinethecontextprovidedby IMA andrelateddevelopmentsin avionics.

2.1 Integrated Modular Avionics

It canbearguedthatthesimplestinterpretationof IMA envisionsanarchitecturethattech-nology hasalreadyrenderedobsolete:an embeddedsystemsversionof the centralizedtime-shared“mainframe.” Thanksto recenttechnologicaldevelopments,powerful proces-sors,large memories,andhigh-bandwidthlocal communicationsareall availableasreli-ableand inexpensive commodityitems, and thesedevelopmentssurely favor lessratherthanmorecentralization. Thus, this argumentproceeds,a modernavionics architectureshouldbe more,not less,federated,with existing functions“deconstructed”into smallercomponents,andeachhaving its own processor.

Thereis someplausibilityto thisargument,but thedistinctionbetweenthe“morefeder-ated”architectureandcentralizedIMA provesto bemootoncloserinspection.A federatedarchitectureis onewhosecomponentsareverylooselycoupled—meaningthatthey canop-eratelargely independently. But thedifferentelementsof a function—forexample,verticalandhorizontalflight pathcontrol in an autopilot—usuallyarerathertightly coupled(andit is arguedbelow that they shouldbecomeevenmoretightly coupled)so that thedecon-structedfunctionwould not bea federatedsystemsomuchasa distributedone—meaninga systemwhosecomponentsmay be physicallyseparated,but which mustcoordinatetoachieve somecollective purpose.Dually, a centralizedIMA architecturewould not be asimplemainframe—fora computersystemsupportingflight functionsmustprovide repli-catedandphysicallydistributedhardwarefor fault tolerancetogetherwith mechanismsforredundancy management.Consequently, a conceptuallycentralizedarchitecturewill be,internally, a distributedsystem,andthe basicservicesthat it provideswill not differ in asignificantway from thoserequiredfor themorefederatedarchitecture.

3

4 Chapter2. InformalRequirements

Anothercontrarianpointof view is thatneithercentralizedIMA nor themorefederatedarchitectureofferssignificantbenefitsover currentpractice;thepresentfederatedarchitec-turehasbeenvalidatedby experience,andmodernhardwaretechnologywill reduceits costpenalty—sothereis no reasonto changeit. Theargumentagainstthis pointof view is thatit takesa very narrow interpretationof the costsassociatedwith the currentarchitectureandthereforegrosslyunderestimatesthem. Oneneglectedcostis safety:thefederatedar-chitecturehasthe advantageof naturalfault containment,but it imposesa cost in poorlycoordinatedcontrolandcomplex andfault-pronepilot interfaces.

Thecurrentallocationof flight automationto separatefunctionsis theresultof largelyaccidentalhistoricalfactors.Consequently, certaincontrolvariablesthataretightly coupledin a dynamicalsenseare managedby different functions: for example,enginethrust ismanagedby the autothrottleandpitch angleby theautopilot. Sincea changein eitherofthesevariablesaffectstheotherandthereis no higher-level functionthatmanagesthemina coordinatedmanner, suchconceptuallysimpleservicesascruisespeedcontrol,altitudeselect,andverticalspeedhave complex andimperfectimplementationsthataredifficult tomanage.For example,Lambregts[59, page4] reports:

“Becausetheactionsof theautothrottlearenot tacticallycoordinatedwith theautopilot, the autothrottlespeedcontrol constantlyupsetsthe autopilotflightpathcontrolandvice versa,resultingin a notoriouscouplingproblemfamil-iar to every pilot. It manifestsitself especiallywhenexcitedby turbulenceorwindshear, to the point wherethe trackingperformanceandride quality be-comesunacceptable.Theold remedyto breakthecouplingwasto changetheautopilotmodeto ALTITUDE HOLD (e.g.,theolderB747-200/300).Onnewerairplanes,this problemhasbeenreducedto anacceptablelevel for thecruiseoperationaftera very difficult andcostlydevelopmentprocess,implementingprovisionssuchasseparationof the control frequency by going to very lowautothrottlefeedbackgain, applicationof ‘energy compensation,’ turbulencecompensation,andnonlinearwindsheardetections/compensation.”

And again:

“Due to the lack of propercontrolcoordination,theautopilotALTITUDE SE-LECT and VERTICAL SPEED modesnever functionedsatisfactorily. . . theseproblemsresultedin developmentof the FLIGHT LEVEL CHANGE (FLC)modethatwasfirst implementedon theB757/B767.. .however themodelogicdependson certainassumptionsthat arevalid only for certainoperations,sothe logic canbe tricked andcausean incorrector poorly coordinatedcontrolresponse.. .asa result therehave beena numberof incidentswherethe FLCmodedid notproperlyexecutethepilot’s command.”

Thelackof properlyintegratedcontrolcausedby theartificial separationof functionsin thefederatedarchitectureis oneof thefactorsthat leadsto thecomplex modesandsubmodes

2.1. IntegratedModularAvionics 5

usedin thesefunctionsandthenceto the “automationsurprises”and“mode confusions”thatcharacterizeproblemsin the“flightcrew-automationinterface.” Numerousfatalcrashesandother incidentsareattributed to suchhumanfactorsproblems[27, AppendixD], butit is clear from their origins in the artificial separationof functionsthat theseproblemsareunlikely to be solved by local improvementsin the interfacesandcuespresentedtopilots. Theplethoraof modes,submodes,andtheir correspondinginteractionsalsoexactsa high cost in development,implementation,andcertification. If this analysisis correct,thetraditionalfederatedarchitectureis a majorobstacleto a morerationalorganizationofflight functions,andIMA is thebesthopefor removing thisobstacle.

The topicsconsideredso far suggestthat theappropriatecontext in which to examinepartitioningfor IMA is adistributedsystemin which flight functions(whichmight well bedefinedandsubdivided differently thanin the traditionalfederatedarchitecture)areeachallocatedto separateprocessors(replicatedasnecessaryfor fault tolerance).In this model,we would needto considerpartitioningto limit fault propagationbetweenthe processorssupportingeachfunction, but not within them. This model, however, overlooksa newopportunitythatcouldbecreatedby morefine-grainedpartitioning.

If functionshave no internalpartitioning,thenall their softwaremustbe assuredandcertifiedto the level appropriatefor that function. Thus,all the software in an autopilotfunctionis likely to requireassuranceto Level A of DO-178B(this,thehighestlevel of DO-178B, the guidelinesfor certificationof airbornesoftware[29,84], is for softwarewhosemalfunctioncouldcontributeto a catastrophicfailurecondition[28]), andthis discouragestheinclusionof any softwarethatis notstrictly essentialto thefunction.While thismaybea goodthing in general,it alsodiscouragesinclusionof servicesthatcouldhave a positivesafetyimpact,suchascontinuousself-test,or for-information-onlymessagesto the pilot.More generally, partitioningwithin a processorcould allow an individual function to bedivided into softwarecomponentsof differentcriticalities; eachcould thenbe developedandcertifiedto the level appropriateto its criticality; therebyreducingoverall costswhileallowing assuranceeffort to befocusedon themostimportantareas.Without partitioning,the concernthat a fault in lesscritical softwarecould have an impacton the operationofmorecritical softwarenecessarilyelevatesthe criticality of thefirst to that of thesecond;partitioningwould remove thedangerof fault propagationandallow thecriticality of eachsoftwarecomponentto beassessedmorelocally.

Theconsiderationsof thepreviousparagraphsuggestthatfor partitioningwithin a sin-gle processorit might be appropriateto limit attentionto the casewherethe processorissharedby thecomponentsof only a singlefunction. We might supposethat thesecompo-nentsconsistof oneimplementingthemainfunctionandseveralothersprovidingsubsidiaryservices.Sinceafault in themaincomponentamountsto afault in (thisreplicaof) theover-all function,thereseemslittle point in protectingthesubsidiarycomponentsfrom faultsinthe main component,and this suggeststhat partitioningcould be asymmetric(the maincomponentis protectedfrom thesubsidiaryones,but notviceversa).It is notclearwhethersuchasymmetrywouldprovideany benefitin termsof simplicity or costof thepartitioning


mechanisms,but thepoint is probablymootsinceotherscenariosrequirea symmetricap-proach.Onescenariois supportfor severalminor functions,for exampleundercarriageandweatherradar, onasingleprocessor. Wherethefunctionsarenotrequiredat thesametime,partitioningcouldperhapsbeachieved by giving eachonesolecommandof its processorwhile it isactive(thisis similarto “periodsprocessing”in thesecuritycontext), but themoregeneralrequirementis for simultaneousoperationwith symmetricpartitioning.Thesecondscenarioconcernsvery cost-sensitive applications,suchassingle-enginegeneralaviationaircraft. Hereit may be desirableto run multiple major functions(suchasautopilotandrudimentaryflight management)on a single(possiblynon-fault-tolerant)processor. Thereareevenproposalsto hostthesefunctionson mass-market systemssuchasWindows NT.Althoughonecanbeskepticalof this proposal(particularlyif “free flight” air traffic con-trol makesflight managementdatafrom generalaviationaircraftcritical to overallairspacesafety),it seemsworth examining the technicalfeasibility of symmetricpartitioning forcritical functionswithin asingleprocessor.

The currentfederatedarchitecturenot only usesa lot of computersystems,it usesalot of different computersystems:eachfunction typically hasits own uniquecomputerplatform. Thereis a high costassociatedwith developingandcertifying software to runon theseidiosyncraticplatforms. Logically independentof IMA, but coupledto it quitestronglyin practice,aremovesto definestandardizedinterfacesto theplatformsthatsup-port flight functionsandto introducesomeof theabstractionsandservicesprovidedby anoperatingsystem.TheARINC 653(APEX) [4] standardrepresentsastepin thisdirection.Developmentssuchasthis couldsignificantlyreducethecostof avionics softwaredevel-opmentandmight stimulatecreationof standardmodulesfor commontasksthatcouldbereusedby differentfunctionsrunningondifferentplatforms.

Thedesignchoicesfor partitioninginteractwith thosefor providing operatingsystemservices.Themajordecisionis whetherpartitioningis providedabove anoperatingsystemlayer(figure2.1(a)),or above a minimal kernel(or executive) with mostoperatingsystemservicesthenprovided separatelyin eachpartition (figure 2.1(b)). The first choiceis theway standardoperatingsystemsarestructured(with partitionsbeingclient processes),butit hasthe disadvantagethat partitioning then relies on a greatdeal of operatingsystemsoftware.Thesecondchoiceis sometimescalledthe“virtual machine”approach,andit hastheadvantagethatpartitioningreliesonly on thekernelandits supportinghardware.�

AnotherareawhereIMA hasthepotentialto reducecostsis throughimproveddispatchreliability. Critical flight functionsmusttoleratehardwarefaults,andsothey run on repli-catedhardware(typically quad-redundantor greaterfor primaryflight controlanddisplays,triple for autopilotandautoland,dualfor flight managementandyaw damping,andsingle�Someoperatingsystemsusethesecondmodel.It wasfirst employedin VM/370 [70], whichservedasthe

basisfor amajorearlysecuresystemdevelopment[7,35]. Fully virtualizing theunderlyinghardwareis expen-sive, so later “ � -kernels”suchasMachandChorusprovideda moreabstractinterface. Thesealsoproved tohavedisappointingperformance.Second-generation� -kernelsandcomparabletoolkitssuchasExokernel[49],Flux [31], L4 [38], andSPIN[11] achievegoodperformanceandintroduceseveralimplementationtechniquesrelevantto thedesignof partitionedsystems.

2.1. IntegratedModularAvionics 7

PartitionA PartitionB

OperatingSystem

Hardware

(a)

Hardware

PartitionA PartitionB

Kernel

OSServicesA OSServicesB

(b)

Figure2.1: Alternative OperatingSystem/PartitioningDesigns

for autothrottle).But despitethemassivecostof providing afault-tolerantplatformfor eachfunctionanddespitethe largenumberof separateprocessorsandothercomponentsavail-able(therecanbeasmany as50 processorsamongthemajor functionsof a largemoderntransportplane),thefederatedarchitecturedoesnot provide a largemargin of redundancynoroperationalflexibility. A singlefaultyprocessorin any functionmaybeenoughto pre-venttakeoff (therebyrequiringmaintenancein possiblylessthanidealcircumstances),andmultiplefaultsafflicting suchafunctionduringflight might triggeradiversionor haveevenmoreseriousconsequences.With IMA, in contrast,replicatedprocessorsarenot boundto a specificfunction,but canbe allocatedasrequired:normaloperationcancontinueaslong asthe total numberof nonfaulty processorsis sufficient to provide therequiredlevelof replicationto eachfunction. This increasesoverall safetymargins,while alsoallowingmaintenanceto bedeferred(e.g.,until theaircraft’s schedulebringsit to a majormainte-nancebase)[40].�

Theability to exploit this increasedredundancy andflexibility dependsonasystematicapproachto fault tolerancewithin functions(sothatthey arenot tightly boundto aspecificprocessor)andacrossthedistributedcoordinationmechanismsof theIMA platformitself.Designof fault-tolerantsystemsis notonly amassively difficult andexpensive activity (thebasicmechanismsof fault toleranceconcernthecoordinationof distributed,real-timesys-temsoperatingin thepresenceof faults,whichareamongthehardestproblemsin computerscience)but is oftenapervasiveone:thatis,mechanismsfor fault toleranceandredundancymanagementin avionicsareseldomencapsulatedasanoperatingsystemor middlewareser-vicebut insteadaffect thedesignof everypieceof softwarewithin thefunction.As aresult,it is generallyimpossibleto take software—oreven the designfor a pieceof software—from one function and reuseit in another, or on anotherplatform, even whenstandardssuchasAPEX areused,for thesestandardsconcernonly themechanicsof systemcallsanddo not addressthedeeperconcernsof systematicandtransparentfault tolerance.Anotherreasonfor the pervasive influenceof fault tolerancein currentsystemdesignsis that the�Currentimplementationsof IMA allocatefunctionsto processorsat startuptime; reconfigurationin flight

is a futureprospect.


thesemechanisms(andmostothersthat involve coordinationacrossmultiple processorsandfunctions)areseldomcompositional—meaningthatthereis no a priori guaranteethatelementsthateachwork ontheirown will alsowork in combination.Themassiveresourcesexpendedon systemsintegrationarea symptomof the lack of compositionalityprovidedby currentdesignpractices.

Thus,full realizationof thebenefitsof IMA requiresadoptionof modernconceptsforsystematic,compositional,fault-tolerantreal-timesystemdesign[55]. Thesewould re-ducethepervasive impactof fault tolerancein avionicssoftwaredevelopmentandprovidecostsavings andopportunitiesfor reusethat could be muchgreaterthan thoseprovidedby lower-level standardssuchasAPEX. Takento their conclusion,suchapproachescouldcompletelydecoupletheimplementationof flight functionsfrom thatof their fault-tolerantplatform andpossiblyenableeachto be certifiedseparately. The impactof suchdevel-opmentsonpartitioningis, first, a requirementthatthedistributedpartitioningmechanismsmustthemselvesberobustly fault tolerantand,second,thatthesemechanismsmustcooper-atewith operatingsystemor kernelfunctionsto providetheservicesrequiredfor systematicandtransparentfault tolerancein theimplementationsof flight functions.

Summarizingthisreview of issuesin IMA, weseethatpartitioningshouldbeconsideredboth within a single processorand acrossa distributed systemand that partitioninghasinteractionswith theprovisionof operatingsystemservicesandtransparentfault tolerance.In thenext sectionweexaminetherequirementsfor partitioninga little moreclosely.

2.2 Partitioning

Thepurposeof partitioningis fault containment:a failure in onepartitionmustnot prop-agateto causefailure in anotherpartition. However, we needto be careful aboutwhatkindsof faultsandfailuresareconsidered.Thefunctionin a partitiondependson thecor-rectoperationof its processorandassociatedperipherals,andpartitioningis not intendedto protectagainsttheir failure—thiscanbe achieved only by replicatingfunctionsacrossmultiple processorsin a fault-tolerantmanner. After all, eachfunction would be just asvulnerableto hardwarefailureif it hadits own processor. Rather, theintentof partitioningis to control theadditionalhazardthat is createdwhena functionsharesits processor(or,moregenerally, a resource)with otherfunctions.Theadditionalhazardis thatfaultsin thedesignor implementationof onefunctionmayaffect theoperationof otherfunctionsthatshareresourceswith it. � Now adesignor implementationfault in aflight functionis surelya very seriouseventandit might besupposedthat(a) suchfaultsaresoseriousthatit doesnot matterwhatelsegoeswrong,or (b) certificationensuresthatsuchfaultscannotoccur.Bothsuppositionswould, if true,diminishtherequirementsfor partitioning.

�Partitioningcanalsolimit the consequencesof transienthardwarefaults(by containingthemwithin the

partitionthatis directlyaffected),but thatis asidebenefit,nota requirement.

2.2. Partitioning 9

The first point is easilyrefuted: the whole thrustof aircraft certificationis to ensurethat failuresareindependent(andindividually improbable)if their combinationcould becatastrophic.Thus,while a designfault in, say, theautothrottlefunctionwould beserious,appropriatedesignandsystem-level hazardanalysiswill ensurethat it is not catastrophic,providedotherfunctionsdo not fail at thesametime. Allowing a fault in this functiontopropagateto another(e.g.,autoland)wouldviolatetheassumptionof independentfailures.Thus,far from a fault in a critical functionbeingso seriousasto renderconcernfor par-titioning irrelevant, it is theneedto containthe consequencesof sucha fault that renderspartitioningessential(andelevatesits criticality to at leastthatof themostcritical functionsupported).

It could be arguedthat both functionswill certainlybe lost if their sharedprocessorfails, so they surelywould not besharingif their correlatedfailure couldbecatastrophic.This overlooksa coupleof points.First,malfunctionor unintendedfunctionis oftenmoreseriousthansimplelossof function,andthe consequencesof a propagatingfault (unlikethoseof aprocessorfailure)maywell beof thesemoreseriouskinds.For example,abufferoverflow in onefunctionmight overwritedatain another, leadingto unpredictableconse-quences.(The PhobosI spacecraftwas lost in just this circumstance—whena keyboardbuffer overflowedinto thememoryof acritical flight controlfunction[14,17].) Second,theincreasedinterdependency wroughtby IMA may introducesharedresources—andhencepathsfor fault propagation—thatarelessobviousandmoreeasilyoverlooked thansharedprocessors.For example,functionsin separateprocessorswherecorrelatedfailure wouldnotbeanticipated(andwouldnotoccurin a federatedarchitecture)mightbecomevulnera-ble to fault propagationthroughasharedbusin anIMA architecture.

Returningto thesecondpoint raised(that certificationoughtto ensuretheabsenceofdesignandimplementationfaults),notethatcertificationrequiresassuranceproportionaltothe consequencesof failure. In a federatedarchitecture,suchconsequencesaregenerallylimited to thefunctionconcerned,sothatassuranceis relatedto thecriticality of thatfunc-tion. But, if thefailureof onefunctioncouldpropagateto others,thena low-criticality (andcorrespondinglylow-assurance)functionmightcauseahigh-criticalityfunctionto fail. Thismeansthateitherall functionsthatshareresourcesmustbeassuredto thelevel of themostcritical (suchelevationin assurancelevelsis directlycontraryto oneof thegoalsof IMA) orthatpartitioningmustbeusedto eliminatefault propagationfrom low-assurancefunctionsto thoseof highcriticality. Whendifferentfunctionsalreadyhappento have thesamelevelof assurance,theneedfor partitioningmaynot besogreat,andit hasbeensuggestedthatfunctionswith softwareassuredto Level A of DO-178Bmaybeallowedto shareresourceswithoutpartitioning.Note,however, thata fault thatcausesonefunctionto induceafailurein anothermight not affect the operationof the first (asnotedabove, a temporarybufferoverflow canhave this property).And althoughcertificationrequiresassuranceof theab-senceof suchunintendedeffectsaswell aspositive assurancethattheintendedfunctionisperformedcorrectly, it is generallymuchharderto provide thefirst kind of assurancethanthesecond.Furthermore,sharedresourcescreatenew pathwaysfor thepropagationof un-


intendedeffects,andthesepathwaysmight not have beenconsideredwhenassurancewasdevelopedfor the individual functions. Consequently, partitioningseemsadvisableevenwhenthefunctionsconcernedareof thesamelevel of criticality andall softwareis assuredto thesamelevel.

Summarizingthe discussionin this chapter, we mayconcludethat futureavionics ar-chitectureswill have the characterof distributed, ratherthanfederated,systemsandthatmultiplefunctions,of possiblydifferentlevelsof criticality andassurancewill besupportedby the samesystem. Resources,suchasprocessors,communicationsbuses,andperiph-eraldevices,maybesharedbetweendifferentfunctions. Sharedresourcesintroducenewpathwaysfor faultpropagation,andthesehazardsmustbecontrolledby partitioning.

Becausepartitioningis requiredto preventfault propagationthroughsharedresources,asuitablebenchmarkor “Gold Standard”for theeffectivenessof partitioningwouldseemtobeacomparablesystem(intuitively afederatedone)in whichtherearenosharedresources.This is capturedin thefollowing.

Gold Standard for Partitioning

A partitionedsystemshouldprovide fault containmentequivalent to an ide-alizedsystemin which eachpartition is allocatedan independentprocessorandassociatedperipheralsandall inter-partition communicationsarecarriedondedicatedlines.

Although this Gold Standardprovidesa suitablementalbenchmarkfor designersandcertifiersof partitioningmechanismsfor IMA, it is lessusefulasa“contract”with the“cus-tomers”of suchmechanisms.Thesecustomers—thatis, thosewho develop softwareforthe functionsthatwill run in thepartitionsof an IMA architecture—areassuredthat theirsoftwarewill beaswell protectedin a partitionasif it hadits own dedicatedsystem,butthey arenot provided with a concreteenvironmentin which to develop, test,andcertifythatsoftware.TheGoldStandardimpliesthattheenvironmentprovidedby thepartitionedsystemto aparticularapplicationfunctionmustbeindistinguishablefrom anidealizedsys-tem dedicatedto that functionalone,but this idealizedsystemis just that—animaginaryartifact—andnot onesuitablefor testingandevaluatingreal-world software.Theonly en-vironmentactuallyavailableis thepartitionedsystemitself, soits customersneedacontractexpressedin termsof thatenvironment.Thiscanbedoneasfollows: insteadof comparingtheenvironmentperceived by thesoftwarein a partitionto thatof an idealized,dedicatedsystem,we requirethattheenvironment(whatever it is) is onethatis totally unaffectedbythebehavior of softwarein otherpartitions.Thisleadsto thefollowing alternativestatementof ourGoldStandard.I amgratefulto David Hardin,DaveGreve,andMatt Wilding of CollinsCommercialAvionicsfor explain-

ing thisapproachandits motivationto me[113].

2.2. Partitioning 11

Alternative Gold Standard for Partitioning

Thebehavior andperformanceof softwarein onepartitionmustbeunaffectedby thesoftwarein otherpartitions.

This formulationis not only simplerandmoredirect thanthat involving an idealizedsystem,but it alsosuggestshow thecustomersof apartitionedsystemcandevelopandeval-uatetheirsoftware—forif softwarein onepartitionis unaffectedby thatin otherpartitions,it will runthesame(in termsof bothbehavior andperformance)whethertheotherpartitionsareinhabitedor empty. Thus,in particular, individual softwarefunctionscanbedevelopedandcertifiedusingtherealenvironmentof thepartitionedsystem,but with theotherparti-tionsempty(or, morelikely, containingstubsto providethedatasourcesandsinksrequiredby thefunctionunderexamination).TheAlternative Gold Standardensuresthat thecerti-fiedsoftwarewill behave exactly thesamewhenthoseotherpartitionsareinhabitedby real(andpossiblyfaulty) functions.

A problemwith the Alternative Gold Standardis apparentin the mentionof “datasourcesandsinks” in thepreviousdiscussion:softwarefunctionsresidingin separatepar-titions areseldomcompletelyindependent—someprovide dataor control inputsto others.This meansthat “unaffectedby the software in otherpartitions” needsto be qualifiedinsomeway that allows the effectsof intendedcommunicationswhile excluding thosethatareunintended.Thus,althoughthe Alternative Gold Standardis moreattractive thantheoriginaloneasa requirementsdefinitionfor partitioningisolatedfunctions,it needsfurtherdevelopmentbeforeit canserve asagoldstandardfor themoregeneralcaseof partitionedbut interactingfunctions.Whenrestrictedto isolatedfunctions,thebasicandtheAlterna-tiveGoldStandardsareverysimilar; indeed,if suitablyformalized,eachwouldbedefinablein termsof theother.

Theoriginal formulationof theGold Standardhastheadvantagethat it focusesatten-tion on thestructuraldifferencesbetweena partitionedsystemanda federatedone.Thesestructuraldifferencesintroducetwo classesof hazardsinto a partitionedsystem:a faultin onepartition could corruptcode,control signals,or data(in memoryor in transit)be-longingto another, or it couldaffect theability of anotherpartitionto obtainaccessto, orservicefrom, a sharedresource(suchasthe processoror a bus). In consideringissuesinthe designandassuranceof partitionedsystems,it is thereforeuseful to distinguishtwodimensions—spatialandtemporal—corresponding to thesetwo classesof hazards.

Spatial Partitioning

Spatialpartitioningmustensurethat softwarein onepartitioncannotchangethesoftwareor privatedataof anotherpartition(eitherin memoryor in transit)norcommandtheprivatedevicesor actuatorsof otherpartitions.


Temporal Partitioning

Temporalpartitioningmustensurethat the servicereceived from sharedre-sourcesby thesoftwarein onepartitioncannotbeaffectedby thesoftwareinanotherpartition.This includestheperformanceof theresourceconcerned,aswell astherate,latency, jitter, anddurationof scheduledaccessto it.

Themechanismsof partitioningmustblock thespatialandtemporalpathwaysfor faultpropagationby interposingthemselvesbetweenavionicssoftwarefunctionsandthesharedresourcesthatthey use.In this way, thepartitioningmechanismscancontrolor “mediate”accessto sharedresources.In the next chapter, we considerthe mechanismsthat canbeusedto providemediationin eachof thetwo dimensionsof partitioning.

Chapter 3

Issues and Mechanisms

As discussedin the previous chapter, issuesin partitioningariseat two levels: within asingleprocessorandacrossa distributedsystem. Issuesin partitioningalsointeractwiththosein fault tolerance. We considerthesetopicsseparatelybelow and further separatetheminto considerationof spatialandtemporalpartitioning.

3.1 Partitioning Within a Single Processor

We startby consideringpartitioningwithin a singleprocessor. We sometimesusetheneu-tral termapplicationto referto thecomputationalentitywithin eachpartition;thiscouldbea completeavionicsfunction(e.g.,a yaw damper)or a partof one. Dependingon theim-plementation,anapplicationcouldcorrespondto theoperatingsystemnotionsof processorvirtual machine, or it couldbesomedifferentnotion.An applicationwill generallybecom-posedof smallerunitsof computationthatarecalledor scheduledseparately;wegenerallyrefer to theseas tasks. Again dependingon the implementation,thesemaycorrespondtoanoperatingsystemnotionsuchasthreador lightweightprocess. Partitioningmustpreventapplicationsinterferingwith oneanother, but the taskswithin a singleapplicationarenotprotectedfrom eachother. We focusfirst onpartitioningin thespatialdimension.

3.1.1 Spatial Partitioning

The basicconcernof spatialpartitioning is the possibility that software in one partitionmight write into the memoryof another: memory is often picturedas a one- or two-dimensionalgrid; hence,thereferenceto thespatialdimensionfor this aspectof partition-ing. Memory includesthatusedto storeprogramsaswell asdata;although,in embeddedsystems,it is sometimespossibleto hold theformerin ROM whereit cannotbeoverwrittenby errantsoftware.

Hardwaremediationprovidedby amemorymanagementunit (MMU) is theusualwayto guardagainstviolations of spatialpartitioning. The detailsvary from one processor

13

14 Chapter3. IssuesandMechanisms

designto another, but thebasicideais thattheprocessorhas(at least)two modesof oper-ationand,whenit is in “user” mode,all accessesto memoryaddressesareeithercheckedor translatedusingtablesheld in the MMU. A layer of operatingsystemsoftware(gen-erally calledthe kernel) managesthe MMU tablesso that the memorylocationsthat canbe readandwritten in eachpartition aredisjoint (apart,possibly, from certainlocationsusedfor inter-partition communications).Thekernelalsousesthe MMU to protectitselffrom beingmodifiedby softwarein its client partitionsandmustbecarefulto managetheuser/supervisormodedistinctionsof the processorcorrectlyto ensurethat the mediationprovided by the MMU cannotbe bypassed.(In particular, entry andexit from the kernelneedsto behandledcarefullysothatsoftwarein a partitioncannotgainsupervisormode;someprocessorshave haddesignflaws thatmake this especiallydifficult [43].)

Softwareexecutingin apartitionaccessesprocessorregisterssuchasaccumulatorsandindex registersaswell asmemory. Generally, thekernelarrangesthingssothatthesoftwarein onepartitionexecutesfor awhile, thenanotherpartitionis givencontrol,andsoon;whenonepartitionis suspendedandanotherstarted,thekernelfirst savesthecontentsof all theprocessorregistersin memorylocationsdedicatedto thepartitionbeingsuspendedandthenreloadstheregisters(includingthosein theMMU thatdeterminewhich memorylocationsareaccessible)with valuessaved for thepartition that executesnext. The softwarein thepartitionresumeswhereit left off andcannottell (apartfrom thepassageof time while itwassuspended)thatit is sharingtheprocessorwith otherpartitions.

Thedescriptionjustgivenresemblesclassicaltime-sharing,wherepartitionscanbesus-pendedat arbitrarypointsandresumedlater. Somevariationsarepossiblefor embeddedsystems.For example,if partitionsareguaranteedanuninterruptibletime sliceof knownduration,they canbeexpectedto have finishedtheir tasksbeforebeingsuspendedandcanthenberestartedin somestandardstateratherthanresumedwherethey left off. Thiselim-inatesthecostof saving theprocessorregisterswhena partitionis suspended(but at leastsomeof them—includingtheprogramcounter—mustberestoredto standardvalueswhenthepartitionis restarted).Wecanreferto thetwo typesof partitionswappingarrangementsastherestoration andrestartmodels,respectively.

In eithercase,the requirementon the mediationmechanismsmanagedby the kernelis that thebehavior perceivedacrossa suspensionby thesoftwarein eachpartitionis pre-dictablewithout referenceto anything externalto thepartition. In the“restoration”model,theprocessorstatemustberestoredto exactlywhatit wasbeforesuspension;in the“restart”model,it mustberestoredto someknown state.It maybeacceptablein the lattercasetospecifythatsomeregistersmaybe“dirty” on restartandthat thesoftwarein a partitionisrequiredto work correctlywithoutmakingassumptionsontheir initial contents—thissavesthecostof restoringtheseregistersto standardvalues(obviously, theprogramcounterandMMU registersmustberestored).� Therequirementto make behavior predictableacrossthesuspensionandresumptionof a partitiongeneratesin turn therequirementthattheop-�Althoughpartitioninghasmuchin commonwith computersecurity, this is oneaspectwherethey differ:

“dirty” registersareanathemain computersecuritybecausethey provide a channelfor informationflow from

3.1. PartitioningWithin aSingleProcessor 15

erationof theprocessormustbespecifiedpreciselyandaccuratelywith respectto all of itsregisters—forit is importantthat registersaving andrestorationor reinitializationshouldnot overlookvisibleminor registerssuchasconditioncodesandfloatingpoint/multimediamodesandthat hiddenregisters,suchas thoseassociatedwith pipelinesandcaches,re-ally arehidden. (Again, processorsoften have designglitchesor errorsandomissionsindocumentationthatmake it difficult to accomplishthis [98].)

In theapproachjust outlined,themechanismsof spatialpartitioningcomprisethepro-cessorandits MMU andthekernel.Thereis muchadvantage,from thepointof view of as-suranceandformalspecification,if thesemechanismsaresimple.Unfortunately, commod-ity processors,their MMUs, andassociatedfeaturessuchasmemorycachesaregenerallydesignedfor highperformanceandextensive functionalityratherthansimplicity. Althoughafastprocessoris oftendesired,thefunctionalityof MMUs andcachecontrollersgenerallyexceedsthat requiredfor embeddedsystems;MMUs, in particular, areusuallydesignedto provide a flexible virtual memoryandcontainlargeassociative lookuptables—whereasfor partitioning,a simplefixedmemoryallocationschemewould beadequate.� Thelatterwould alsobe far lessvulnerableto bit-flips causedby single-event upsets(SEUs)thanatraditionalmillion-transistorMMU. However, becausethey areusuallyhighly integratedwith theirprocessor, it canbedifficult or evenimpossibleto replaceMMUs andcachecon-trollerswith simplerones,but considerationshouldbegivento this issueduringhardwareselection.

An alternative to spatialpartitioningusinghardwaremediationis SoftwareFault Isola-tion (SFI) [109]. Theideahereis similar to arrayboundscheckingin high-level program-ming languagesexceptthatit is appliedto all memoryreferences,not just thosethatindexinto arrays. By examiningthe machinecodeof the software in a partition, it is possibleto determinethe destinationsof somememoryreferencesandjumpsandhenceto check,statically, whetherthey aresafe.Memoryreferencesthatindirectthrougha registercannotbecheckedstatically, soinstructionsareaddedto theprogramto checkthecontentsof theregisterat runtime,immediatelyprior to its use.By usingmoreelaboratestaticanalysisorprogramverificationtechniques(e.g.,to ensurethatanindex registerhasnotbeenchangedsincelastchecked),it is possibleto minimizethenumberof runtimechecks;by usingmod-estoptimizationsof this kind, an overheadof just 4% hasbeenreportedfor the runtimechecksof SFI [109].

onepartitionto its successor. Theissuesunderlyingthisdifferenceareconsideredin Section4.4ApplicationtoPartitioning.�

MMUs arealsoheavily optimizedfor speed:in somearchitectures,the MMU will starta readfrom thememoryusing the currentpagemap beforeit hasdeterminedwhetherthat is still valid; if it is not valid,theMMU squashesthebusreadtransactionbeforeit completes.Also, for efficiency, multiple copiesmaybemaintainedfor someof theassociative lookuptables,andthesemustbekeptconsistentwith eachother. All thisis donein thecontext of speculative out-of-orderexecutionwhereproviding assurancefor correctnessof theseoptimizationsis nontrivial. A separateproblemis the timing uncertaintyintroducedby theseoptimizations:ratiosof 2 to 1 betweenaverage-caseandworst-casetimingsarenotuncommon[52] (seealsohttp://www.intelligentfirm.com/).


Static(i.e.,compile-time)analysisof informationflow within individualprogramswrit-tenin high-level languageshaslongbeenatopic in computersecurity. In its simplestform,someof thevariablesusedby theprogramarelabeledHIGH andsomeLOW, andthegoalisto checkwhetherinformationfrom aHIGH variablecanever influencethefinal valueof onelabeledLOW. Techniquesfor informationflow analysisincludeapproximatemethodssim-ilar to typechecking[21,107]or to dataflow analysis[6] aswell asexactmethods[63] andthosethatrely onformalproof[81]. It is possiblethatapproachesbasedonthesetechniquescouldreduce,or eveneliminate,theruntimeoverheadof SFI.

AlthoughSFI usuallyimposesa smalloverheadon memoryreferenceswithin a parti-tion, it cangreatlyreducethe costof controlledreferencesor procedurecalls acrosspar-titions (comparedwith hardwaremediation,sincethecostof a partitionswap is avoided).However, for reasonsdiscussedlater(page26),suchcross-partitionreferencesmaynot beacceptablein somepartitionedarchitectures,sotheadvantagewouldbemootin thosecases.

A disadvantageof SFIcomparedwith hardware-mediatedpartitioningis thatit imposesanadditionalanalysisandcertificationcostoneveryprogram;whereashardwaremediationhasthe one-timecostof designing,implementing,andcertifying the partitioningmecha-nismsof thekernelandits supportinghardware. On theotherhand,theanalysisrequiredfor SFI lendsitself to powerful automation(cf. “extendedstaticchecking”[22], and“proofcarryingcode”[80]) wherethecertificationcostwould betransferredto theone-timecostof certifying thetools.

Even without automation,SFI may have advantagesof costandsimplicity in “asym-metric” applicationswherea singlefunction is allocatedto a processorbut it is desiredtoincludesomelesscritical “nice-to-have” features.Thesecouldbepartitionedfrom themainsafety-criticalfunction by SFI, while the latter runsunchanged.SFI might alsobe cost-effective in partitioningfunctionsof similarassurancelevelsthatalreadyrequiresignificantanalysis(e.g.,two Level A functions). And SFI could alsobe usedto provide additionalprotectionwithin partitions(i.e.,amongtasks)establishedby hardwaremediation.

OneconcernaboutSFI,especiallywhenstaticanalysisis usedto optimizeawaymanyof theruntimechecks,is thatit provideslittle protectionagainsthardwarefaults(e.g.,SEU-inducedbit-flips) thatcausememoryaddressesthatwerecorrectwhenanalyzedto beturnedinto onesthatareincorrectwhenexecuted.Thebadmemoryreferencewill becaughtonlyif a runtimecheckis in the right place;a hardware MMU, on the otherhand,mediateseveryreferenceat its timeof execution.It wasearlierstatedthatthepurposeof partitioningis to protectfunctionsagainstfaultsof designandimplementationin otherfunctions,notto guardagainsthardware faults—sincethesecould afflict the function even if it had itsown dedicatedprocessor—but a hardwarefault that leadsto a violation of partitioningisnot a fault that would have afflicted the function if it had its own processor, so it seemsthattheconcernis legitimate.However, a little analysisshows thattheincreasedexposureto hardware faults is small. Supposethe function in which we are interestedsharesitsprocessorwith � otherfunctionsof similar sizeandthat theprobabilityof anSEUhittingany oneof themis � . Supposefurther that the probability that an SEU in one function


will causeit to violate SFI partitioningandto afflict someotherfunction is . Thentheprobabilityof anSEUdirectly or indirectly affectingtheoriginal functionchangesfrom �to �� whenthefunctionis movedfrom adedicatedto asharedprocessor. (Noticethatthis is independentof � : thechanceof anSEUhitting somewhere increasesby afactorof � ,but thechancethattheconsequentmemoryerroraffectsthefunctionconcernedis reducedby thesamefactor.) This small increasein probabilityis unlikely to besignificant,andweconcludethatthepossibilityof SEU-inducedaddressingerrorsdoesnot invalidateSFI.

Perhapssurprisingly, it is someimplementationsof hardware-mediatedpartitioningthatseemmorevulnerableto this kind of fault scenario. Although an SEU in an individualfunctioncannotleadto a violation of partitioningwhenmemoryreferencesaremediatedby anMMU, anSEUin theMMU itself couldbequitedangerous.If theMMU is a largedevice with millions of transistors,thenthe possibilityof anupsetcannotbe overlooked,andachangeto onebit in anaddresstranslationregistermaycausethememoryreferencesof onepartitionsystematicallyto infringeon thememoryof another. It seemsto methatindesignswhereit is possibleto provide a customMMU, it would beprudentto ensurethatthis is eitherfault tolerantor that it merelychecksratherthantranslatesaddresses(sothata doublefault would be neededto violatepartitioning);bestof all might be relocationorcheckingwith hardwiredvalues.

Sofar, ourconsiderationof partitioninghasconsideredonly theprocessorandthemem-ory andhasassumedthatdifferentpartitionsaremeantto be isolatedfrom eachother;wenow needto considerinter-partitioncommunicationsanddevices. Like partitioningitself,therearetwo dimensionsto inter-partition communication:the spatialdimensionis con-cernedwith whereandhow datais transferredfrom onepartitionto another, andthetem-poraldimensionis concernedwith whetherandhow synchronizationis performedandhowonepartition invokesservicesfrom another. We postponeconsiderationof the latter top-ics to thediscussionof temporalpartitioningin Section3.1.2andfocushereon thespatialdimension.

Theobviousway to communicatedatafrom onepartitionto anotheris to copy it froma buffer in memorybelongingto thefirst partitioninto a separatebuffer in thememoryofthesecond.Becauseonly thekernelhasaccessto thememoryof bothpartitions,it mustperformthecopying and,sinceit generallyrunswithoutmemoryprotection,it mustcheckcarefullyagainstbuffer overruns.A moreefficient schemeusesa singlebuffer in memorylocationsthat areamongthosethe sendingpartition canwrite andthe receiver can read(bothMMU andSFI formsof memoryprotectioncando this);datacanthenbecopiedintothesharedbuffer by thesendingpartitionwithout theactiveparticipationof thekernel.Thereceiving partitionmustassumethat thesendingonecanwrite arbitrarydataanywhereintheir sharedbuffers whenever it hascontrol,andits verificationmustbeperformedunderthisassumption.It seemscleanestif separatebuffersareusedfor eachdirectionof transfer,but bidirectionalbuffers may alsobe acceptable.It is, however, importantthat separatebuffersareusedfor eachpair of partitions(otherwise,partitionA couldoverwritethedataof B in C’ssingleinputbuffer).


Observe that it is importantto restrictinter-partitioncommunicationsto thosethatareintended:onepartitionshouldbeableto senddatato anotheronly if thatcommunicationis authorizedin the specificationof the systemconfiguration(andthe receiving partitionmustthenhave a buffer to receive it). A relatedtopic is how onepartitionshouldnametheotherpartitionswith which it communicates.Absoluteaddresses(e.g.,“sendthisdatumtoPartition7”) leadto a rigid andfragilesystemorganizationandareto bedeprecatedon thisaccount.Functionaladdresses(e.g.,“sendthis datumto thepitch autopilot”) arelittle bet-ter: they build assumptionsaboutthesystemstructureinto individualapplicationsandlimittheopportunitiesfor reuseandreconfiguration.Relative addressing(e.g.,“sendthis datumout on my Port7”) allows thebindingof namesto specificinter-partitioncommunicationchannelsto be postponeduntil systemconfigurationtime (andmay allow somedynamicreconfiguration),but requiresa databaseto recordwhattypeof dataor serviceis provided(or expected)on a givenport. Thebestarrangementmaybeonewherepartitionsusethetypeof dataor serviceprovidedor expectedasthenameof theport concerned(e.g.,“sendthisdatumoutonmyair-data-samplesport,” or “get meanair-data-sample”);thebindingof thesenamesto inter-partitionchannelscanbedoneduringsystemconfigu-ration,or at runtime. In the lattercase,we have somethinglike a publish-subscribearchi-tecture[82]; thisprovidesexcellentsupportfor dynamicreconfiguration,but its applicationto life-critical systemsis still an issuefor research.(Someavionicssystemsusethis typeof namingor addressingscheme,but not in a way thatis tightly integratedwith their fault-tolerancemechanisms.)

Softwarein onepartitionshouldnot make assumptionsaboutwhentasksin otherpar-titions arescheduled(taskswithin somepartitionsmay be dynamicallyscheduled);this,combinedwith normally asynchronouscommunication,meansthat careis neededwhencommunicatingtime-sensitive data.For example,a taskthatcollectsfrom its inputbuffer asensorsamplecontributedby anotherpartitionneedsto know whenthatsamplewastaken.The usualarrangementis to attacha time stampto the sample(sinceboth partitionsarerunningin thesameprocessor, they have accessto a commonclock). However, theutilityandinterpretationof a sensorsampledependsnot only on its age,but alsoon its accuracyandthedynamicsof thephysicalprocessbeingmeasured(e.g.,analtimeterreadingthatis 1secondold is muchlessusefulif theaircraftis landingthanif it is in cruise).Someof thesefactorsarelikely to bemuchbetterknown to thepartitionthatprovidesthesensorsamplethanto theonethat receivesit, andduplicatingtheknowledgein bothplacesis expensiveandraisestheproblemof ensuringconsistency. Instead,it seemsbestif theprovider of thedataalsoprovidesa compactdescriptionof its temporalinterpretation.Kopetzhasmadeaninterestingproposalof this kind underthenametemporal firewall [53,57], whichexistsin two variants.A phase-insensitivesensorsampleis providedwith a timeanda guaranteethatthesampledvalueis accurate(with respectto aspecificationpublishedby thepartitionthatprovidesit) until theindicatedtime. For example,supposethatengineoil temperaturemaychangeby at most1% of its rangepersecond,that its sensoris completelyaccurate,andthatthedatais to beguaranteedto 0.5%.Thenthesensorsamplewill beprovidedwith


atime500msaheadof theinstantwhenit wassampled,andthereceiverwill know thatit issafeto usethesampledvalueuntil theindicatedtime. This is muchmoreusefulthanatimestampthatmerelyrecordswhenthesamplewastaken.A phase-sensitivetemporalfirewallis usedfor rapidly changingprocesseswherestateestimationis required;in addition tosensorsampleandtime, it providestheparametersneededto performstateestimation.Forexample,alongwith thesampledaltitudeit maysupplyverticalspeed,sothataltitudemaybeestimatedmoreaccuratelyat thetimeof use.

In additionto communicationsbetweenpartitions,we mustexaminecommunicationsbetweenpartitionsanddevices. Devices,which includesensorsandactuatorsaswell asperipheralssuchas massstorage,have implicationsfor both temporaland spatialparti-tioning. Most devicesraisean interruptwhendatais availableor whenthey needservice.Suchinterruptsaffect thetiming andlocusof control,andconsiderationof their impactispostponedto thediscussionon temporalpartitioningin Section3.1.2;herewe concentrateon therelationshipof devicesto spatialpartitioning.Devicesimpactspatialpartitioninginthreeways: they needto beprotectedagainstaccessby thewrongpartition,they mustnotbeallowedto becomeagentsfor violatingpartitioning,andthey maythemselvesneedto bepartitioned.

The simplestcaseis wherea device “belongs” to somepartition and shouldnot beaccessedby others. Most modernprocessorsusememory-mappedI/O, meaningthat in-teractionwith devicesis conductedby readingandwriting to registersthatarereferencedlike ordinarymemorylocations. In thesecases,the mechanisms(MMU or SFI) usedtoprovide ordinarymemoryprotectioncanalsoprotectdevices. If memoryprotectionis in-sufficiently fine-grainedto permitdevicesto beallocatedto partitionsasdesired,thenit willbe necessaryto createspecialdevice managementpartitionsthat own several devicesbutaretrustedto keepthemseparate.Similararrangementswill benecessaryif severaldevicesareattachedto a databusor remotedataconcentrator(andmayalsobeusefulif multicastcommunicationservicesaredesired).Of course,the trust in suchmultiplexing partitionsneedsto be justified by suitableverificationand assurance.An alternative to providingdevice managementpartitionsis to performthesefunctionsin the kernel. The argumentagainstdoingthis is thatthepropertiesof thekernelmustbeassuredto averyhighdegree,so thereis muchadvantageto keepingits functionality assimpleaspossible. It shouldbeeasierto provide assurancefor a kernelthatprovidesmemoryprotection,plusseparatedevice managementpartitions,thanfor akernelhaving bothfunctions.

Somedevicesmay be sharedby morethanonepartition. Suchdevicescomein twoforms: thosethatneedprotectionandthosethatdonot. An exampleof theformeris a sen-sorthatperiodicallyplacesa samplein a device register. Thereseemsno harmin allowingtwo partitionsbothto have readaccessto thememorylocationcontainingthatdevice reg-ister. Devicesthatacceptcommandsaremoreproblematicalin that faulty softwarein onepartitionmay issuecommandsthat renderthe device inoperableor otherwiseunavailableto otherpartitions.Protectionby aspecialdevicemanagementpartitionseemsnecessarytomediateaccessin thesecases.(TheClementinespacecraftwaslost whena softwarefault


causedgarbageto besentover anunmediatedbus,whereit wasinterpretedby anattacheddevice asacommandto fire all thethrusterswithout limit.) Noticethatsuchadevice man-agementpartitionmustplay a moreactive role in checkingor controlling thedevice thanthesimplemultiplexing device managementpartitionsdescribedearlier.

Device managementpartitionsalsoarenecessaryto mediateaccessto truly sharedde-vicessuchasmassstorage.In thesecases,it is usualfor thedevice managerto synthesizea service(e.g.,a file system)ratherthanjust mediateaccessto theraw device (e.g.,a disk)andto partitiontheserviceappropriately(e.g.,with aseparate“virtual” file systemfor eachclientpartition).A devicemanagerof thiskind poseschallengesto assurancethataresimi-lar to thoseof themainmemorypartitioningmechanism,sinceflawscouldallow oneclientpartitionto write into areasintendedfor another.

Massstorageandotherdevicesthat transferlarge amountsof dataat high speedgen-erally do soby directmemoryaccess(DMA) ratherthanthroughmemory-mappeddeviceregisters(which are limited to a few bytesat a time). Dependingon the processorandmemoryarchitecture,DMA devicesmaybeableto addressmemorydirectly, without themediationof the MMU. This arrangementhasthe potentialto violate partitioningsincefaulty softwaremayinstructthedevice to usea region of memorybelongingto somepar-tition otherthanits own; a fault in thedevice itself couldhave a similar effect. A simplesolution is to interposesomecheckingor limiting mechanisminto the device’s memoryaddresslines(e.g.,by cuttingor hard-wiringsomeof them)sothattherangeof addressesitcangenerateis restrictedto lie within thatof thepartitionthatmanagesit. Anothersolutionis to isolateeachDMA device to a private bus with a dual-portedmemorybridging theprivateandmainsystembuses.

3.1.2 Temporal Partitioning

Our context is real-timeembeddedsystems,wherecorrectnessrequiresnot only that theright resultsare produced,but that they areproducedat the right time. The concernoftemporalpartitioningis to ensurethatactivities in onepartitiondonotdisturbthetiming ofeventsin otherpartitions.

Themostgrossconcernsarethatfaultysoftwarein onepartitionmightmonopolizetheCPU,or thatit mightcrashthesystemor issuea HALT instruction—effectively denying ser-viceto all otherpartitions.Otherscenariosthatcancauseapartitionto fail to relinquishtheCPUon time includesimplescheduleoverruns,whereparticularparametervaluescauseacomputationto take longerthanits allottedtime,andrunawayexecutions,whereaprogramgetsstuckin a loop.

Although their manifestationsare in the temporaldimension,systemcrashesand in-structionsthathalttheCPUareusuallypreventedby themechanismsof spatialpartitioning.In particular, HALT andotherdangerousinstructionsusuallycannotbe issued(or, rather,they causea trapto thekernel)whenin usermode.Therearereports,however, thatsomesteppingsof somecommodityprocessorshaveuntrappedinstructionsthatcanhalt theCPU,


or user-modeinstructionsthatcan“hang” whensuppliedwith certainparameters(e.g.,seehttp://www.x86.org; alsoreference98notes102bugsreportedupto 1995in variousversionsandsteppingsof theIntel 80X86architecture,andreference8 documentsacompa-rablenumberin laterprocessors).It is importantto know thesecharacteristicsof theprecisesteppingof theprocessoremployed(which mayrequirea nondisclosureagreement),but itis difficult to provide a completesolutionto suchuntrappedhardwareflaws. Perhapsthebestthat canbe doneis to useSFI-like techniquesandto scanthe machinecodeof eachapplicationandinsertruntimechecksasnecessaryto prevent executionof dangerousin-structionsor parametervalues(a purelystaticcheckwill beinadequateif parametervaluescanbeconstructedor modified—eitherunderprogramcontrolor by anSEU—atruntime).

The last-ditchescapefrom a haltedor locked-upCPU is a watchdogtimer interruptmanagedby thekernel. This is not certainto provide recovery, however, unlessthebasickernel designis correct: for example,designfaults in the Magellanspacecraftled to arunawayexecutionin whichaprogramsatin a loopthatdid nothingbut resetthewatchdogtimer [18, pp. 209–221][25,51],� andnot all haltedor “hung” processorsrespondto thetimer interrupt.Recovery in thesedire casesusuallydependson a systemreset(or cyclingthepowersupply, whichcausesareset),whichmaybeinvokedeithermanuallyor by othercomponentsin adistributedfault-tolerantsystem(which is how Magellanrecovered).

Runaway executionsin the kernel, lockups,anduntrappedhalt instructionscould allafflict a processordedicatedto a single function, and so their treatmentis more in thedomainof system-level designverificationor fault tolerancethanpartitioning. Overrunsor runawayswithin a function,however, aregenuinelytheconcernof partitioningandareusuallycontrolledthroughtimer interruptsmanagedby thekernel: thekernelsetsa timerwhenit givescontrol to a partition; if thepartitiondoesnot relinquishcontrolvoluntarilybeforeits time is up, the timer interruptwill activatethekernel,which thenwill thentakecontrolaway from theoverrunningpartitionandgive it to anotherpartitionunderthesameconstraints.

Merely takingcontrolawayfrom anoverrunningpartitiondoesnotguaranteethatotherpartitionswill beableto proceed,however, for theoverrunningpartitioncouldbeholdingsomeshareddevice or otherresourcethat is neededby thoseotherpartitions. The kernelcouldbreakany locksheldby theerrantpartitionandforcibly seizetheresource,but thismaydolittle goodif theresourcehasbeenleft in aninconsistentstate.Theseconsiderationsreinforcetheearlierconclusionthatdevicesandotherresourcescannotbedirectly sharedacrosspartitions.Instead,amanagementpartitionmustown theresourceandmustmanageit in sucha way thatbehavior by oneclient partitioncannotaffect theservicereceivedbyanother.

�The flaw in Magellanwasin the designof its kernel(sensitive datastructuresweremanipulatedoutside

the protectionof a critical section,so an interrupt could leave them in an inconsistentstate). Suchflawswouldbeunconscionablein asafety-criticalsystem:thedesignof thecorehardwareandsoftwaremechanismssimplyhave to becorrectin thesesystems.In additionto skilledandexperienceddesigners,formalmethodsofspecificationandanalysismaybevaluablefor thispurpose(designdiversityis implausibleat theselevels).


Anotherproblemcanariseif theoverrunningpartition is performingsomeserviceonbehalfof anotherpartition: it will generallybenecessaryto notify theinvokingpartition(thenext time it is scheduled)of thefailureof theserviceprovidedby theother. Theinvokingpartition musthave enoughfault tolerancethat it can do somethingsensibledespitethefailure of the service. It may alsobe necessaryfor the kernel to performsomeremedialactionon the partition that overranits allocation. This could force that partition to do arestartnext time it is scheduled,or couldsimplynotify thepartitionof its failureandleaverecovery (e.g., the killing of orphans)to the operatingsystemfunctionsresidentin thatpartition.

Timeoutmechanismssuchas thosejust describedensurethat eachpartition will getenoughaccessto theCPUandotherresources,but real-timesystemsneedmorethanthis:thetaskswithin partitionsneedto getaccessto theCPUandto devicesandotherresourcesat the right time and with greatpredictability. This meansthat discussionof temporalpartitioning cannotbe divorcedfrom considerationof schedulingissues. The real-timetaskswithin a partitiongenerallyconsistof iterative tasksthat mustbe run at somefixedfrequency (e.g.,20 timesa second)andsporadic tasksthat run in responseto someevent(e.g.,whenthe pilot pressesa button); iterative tasksoften requiretight boundson jitter,meaningthat they mustsamplesensorsor deliver outputsto their actuatorsat very preciseinstants(e.g.,within a millisecondof their deadline),andsporadictasksoften have tightboundson latency, meaningthat they mustdeliver anoutputwithin someshortinterval oftheeventthattriggeredthem.

Therearetwo basicwaysto schedulea real-timesystem:staticallyor dynamically. Ina staticschedule,a list of tasksis executedcyclically at a fixedrate. Tasksthatneedto beexecutedatafasterrateareallocatedmultipleslotsin thetaskschedule.Evensporadictasksarescheduledcyclically (topoll for inputandprocessit if present).Themaximumexecutiontimeof eachtaskis calculated,andsufficient time is allocatedwithin thescheduleto allowit to run to completion: thus,one tasknever interruptsexecutionof another(althoughataskmay be terminatedif it exceedsits allocation). Notice that this meansthat a long-durationtaskmay needto be broken into several smallerpiecesto make room for shorttaskswith higheriterationrates.Thescheduleis calculatedduringsystemdevelopmentandis notchangedat runtime(althoughit maybepossibleto selectamongafixedcollectionofdifferentschedulesat runtimeaccordingto thecurrentoperatingmode).

In a dynamicschedule,on theotherhand,thechoiceandtiming of which tasksto dis-patchis decidedat runtime.Theusualapproachallocatesa fixedpriority to eachtask,andthesystemalwaysrunsthehighest-prioritytaskthatis readyfor execution.If ahigh-prioritytaskbecomesready(e.g.,dueto a timeror externalinterrupt)while a lower-priority taskisrunning,the lower-priority taskis interruptedandthehigh-priority taskis allowed to run.Notethatthis requiresa context-switchingmechanismto save andlaterrestorethestateofthe interruptedtask. Thechallengein dynamicschedulingis to allocatepriorities to tasksin sucha way that overall systembehavior is predictableandall deadlinesaresatisfied.Originally, variousplausibleandadhocschemesweretried(suchasallocatingprioritieson


thebasisof “importance”),but thefield is now dominatedby theratemonotonicscheduling(RMS)schemeof Liu andLayland[66]. UnderRMS,prioritiesaresimplyallocatedon thebasisof iterationrate(thehighestprioritiesgoingto the taskswith thehighestrates)and,undercertainsimplifying assumptions,it canbeshown thatall taskswill meettheir dead-linesaslong astheutilizationof theprocessordoesnotexceed69%(thenaturallogarithmof 2—higherutilizationsarepossiblewhenthe taskiterationratessatisfycertainrelation-ships).Someof thesimplifying assumptions(e.g.,thatthecontext-switchtime is zeroandthattasksdonotshareresources)have beenlifted or reducedrecently[62,69,96].

The choicebetweenstaticanddynamicschedulingis a contentiousone(Locke [67]providesagooddiscussion).Thebasicargumentsin favor of staticschedulingareits com-pletepredictabilityandthesimplicity of its implementation;theargumentsagainstarethatall tasksmustrunatamultipleof thebasiciterationrate(sothatsomerunmoreor lessfre-quentlythanis ideal for their control function),thehandlingof sporadictasksis wasteful,andlong-runningtasksmustbebrokeninto multiple,separatelyscheduledpieces(to makeroomfor taskswith fasteriterationrates).Theargumentsin favor of dynamicschedulingarethat it is moreflexible andcopesbetterwith occasionaltaskoverruns;the argumentsagainsthingeon the difficulty of giving completeassurancethat a given tasksetwill al-waysmeetits deadlinesunderall circumstances.(Thefactorsthatmustbeconsideredarecomplex andnot all arefully characterized;errorsof understandingor judgmentarenotuncommon.For example,themuchpublicizedcommunicationsbreakdowns betweenthe1997Mars Pathfinderandits Sojournerrover weredueto priority inversionsin its RMSscheduler. Priority inversionsarea well-understoodproblemin dynamicallyscheduledsystems,with a well-characterizedsolutioncalled“priority inheritance”[20,96] that wasavailable,but notused,in thecommercialreal-timeexecutive usedfor Pathfinder.)

Themechanismsof bothstaticanddynamicschedulinghave to bemodifiedto operatein a partitionedenvironment,andthesemodificationschangesometraditionalexpectationsaboutthetradeoffs betweenthetwo approaches;in addition,partitioningcreatesopportuni-tiesfor hybridapproachesthatcombineelementsof bothbasicmechanisms.Thetraditionalschedulingproblemis to ensuresatisfactionof all deadlines,given informationabouttherateanddurationof the tasksconcerned.It is assumedthat this informationis accurate;if it is not—if, for example,sometask runs longer or requestsservicemore often thanexpected—thenthesystemmayfail. Whenall the tasksin thesystemarecontributing tosomesingleapplication,sucha failuremaybeundesirablebut will not have repercussionsbeyondthoseconsequenton thefailureof theapplicationconcerned.In a partitionedsys-tem,however, it is necessaryto ensurethatfaultyassumptionsaboutthetemporalbehaviorof tasksbelongingto oneapplicationcannotaffect thetemporalbehavior of applicationsindifferentpartitions.

Thereseemto be two ways to achieve this temporalpartitioning: one is a two-levelstructurein which the kernel schedulespartitions, with the applicationin eachpartitionSeehttp://www.research.microsoft.com/research/os/mbj/Mars_Pathfinder/

Authoritative_Account.html.


thenresponsiblefor locally schedulingits own tasks;theotheris a single-level structureinwhich thekernelschedulestasks, but with a quotasystemto limit theconsequencesof anyfaults—orfaultyassumptions—tothepartitionthatis in violation.

The first approachusuallyemploys staticschedulingat the partition level: the kernelguaranteesserviceto eachpartition for specifieddurationsat a specifiedfrequency (e.g.,20 ms every 100 ms) and the partitionsthenscheduletheir taskswithin their individualallocationsin any way they choose;in particular, partitionsmayusedynamicschedulingfor their own tasks. Any partition that schedulesits tasksdynamicallymust provide amechanismfor interruptingonetaskin favor of another. Suchsupportfor taskswappingisoneof the reasonsfor preferringdynamicover staticscheduling:it simplifiesapplicationprogrammingby allowing long-running,low-frequency tasksto be interruptedby shorterhigh-frequency tasks;whereas,statically scheduledsystemshave to breaklong-runningtasksinto separatelyscheduledfragmentsthat perform their own saving and restorationof local statedatato createroom for the higher-frequency tasks. If partition swappingusestherestorationmodel,however, it providesanalternative mechanismfor dealingwithlong-runningtaskswithin a staticallyscheduledenvironment: a singleapplicationcanbedivided into partsthat are allocatedto separatepartitionsthat are scheduledat differentrates.Thepartition-swappingmechanismthentakescareof interruptingandrestoringthelong-runningtasks,therebysimplifying theirconstruction.

Opportunitiessuchasthismakestaticschedulingfor bothpartitionsandtasksrelativelyattractive. Conversely, theconstraintsof staticpartitionschedulingrenderits combinationwith dynamictaskschedulingratherlessattractive. Oneof the conveniencesof dynamicschedulingis that it allows new tasksto be introduced—orthe frequency anddurationofexisting tasksto bechanged—withrelative ease.But this easeis vitiatedwhenpartitionsarestaticallyscheduledbecause,for example,a new 10-Hz taskcanonly be fitted into apartitionthatis alreadyscheduledat this rate(or somemultipleof it), sothattherigidity ofthepartition-scheduling mechanismdominatesany flexibility in taskscheduling.

This drawbackcouldbe overcome,however, if partitionscould bescheduledat itera-tion ratesvery muchhigherthanthoseof any task—say1,000timesa second.Undertherestorationmodelof partitionswapping,apartitionthatis scheduledatsucha rateandthatis guaranteed,say, onetenthof the CPU (i.e., 100 � s every millisecond)could, for mostpurposes,be regardedasrunningcontinuouslyon a CPU that hasonetenththe power oftherealone,andits taskscouldbedynamicallyscheduledwithout regardto theunderlyingpartition schedule.Partition swapsarerelatively expensive on traditionalprocessors(be-causethereis a large amountof stateinformationthat hasto besaved andrestored),andthis renderskilohertzpartitionschedulesinfeasibleon suchhardware(all theresourcesofthe systemwould be expendedin swapping). However, specializedprocessorsareunderdevelopmentwherepartitionswappingis performedat themicrocodeandhardwarelevels,andthesearebelievedto becapableof supportingpartitionschedulesin thekilohertzrangewith no morethan5% to 10%of thesystemresourcesexpendedon swapping.Noticethatthe taskswappingrequiredfor dynamicschedulingwithin eachpartitioncanberelatively


lightweight (sincetaskswithin a partition arenot protectedfrom eachother)andwill beactivatedata frequency comparableto thefastesttaskiterationrateandnot themuchfasterpartitionswappinggoingonbeneathit.

Theradicalcombinationof astaticpartitionscheduleoperatingatkilohertzratesanddy-namictaskschedulingwithin eachpartitionis anattractiveone:it seemsto provideboththeconvenienceof dynamicschedulingandthe predictabilityof staticscheduling.However,oneof theconveniencesof dynamicschedulingis theeasewith which it canaccommodateaperiodicactivitiesdrivenby externaleventssuchasoperator(e.g.,pilot) inputsanddeviceinterrupts,andit requirescareto supportthis on top of staticpartition scheduling—evenwhenthis is runningat kilohertzrates.Thebasicconcernis thatexternaleventsof interestto onepartitionmustnot disturbthetemporalbehavior of otherpartitions.If partitionsarescheduleddynamically, useof suitablequotaschemescanallow temporalpredictabilitytocoexist with aperiodicevent-driventaskactivations(this is discussedonpage27),but staticpartitionschedulingensurespredictabilitythroughtemporaldeterminismandthis imposesstrongrestrictionsonevent-drivenactivations.

First andmostobviously, a staticpartitionscheduledoesnotallow anexternaleventtoinitiate a partitionswap: thepartitionscheduleis drivenstrictly by theprocessor’s internalclock,sothat if aneventrequirestheservicesof a taskin a partitionotherthanthecurrentone,it mustwait until the next regularly scheduledactivation of the partition concerned.This increaseslatency, but may not be a problemif partitionsarescheduledat kilohertzrates. Lessobvious, perhaps,arethe consequencesof the requirementthat the currentlyexecutingpartitionshouldseeno temporalimpactfrom the arrival of eventsdestinedforotherpartitions.Even thecostof a kernelactivation to latchan interruptfor delivery to alaterpartition reducesavailability of theCPUto thecurrentpartitionandmustbestrictlycontrolled. It is possibleto addpaddingto the time allocatedto eachpartition to allowfor the costof kernelactivity usedto latchsomepredictednumberof interruptsfor otherpartitions.But this makestemporalcorrectnessof onepartitiondependenton theaccuracyof informationprovidedby others(i.e., thenumberandrateof their externalevents)—andevenoriginally accurateinformationmaybecomeuselessif a fault causessomedevice togenerateinterruptsconstantly.

This concernis a manifestationof a moregeneralissue:temporalpartitioningrequiresnotonly thateachpartitionhasaccessto theresourcesof thesystematguaranteedintervals,but that thoseresourcesprovide their expectedperformance.A CPU whoseperformanceis degradedby thecostof latchinginterruptsfor laterdelivery is just oneexample;othersincludea memorysubsystemdegradedby DMA transferson behalfof otherpartitionsoranI/O subsystemthatis busyon theirbehalf.

Understaticpartitionscheduling,temporalpartitioningis predicatedon determinism:becauseit is difficult to boundthebehavior of faulty partitions,theavailability andperfor-manceof eachresourceis ensuredby guaranteeingthatno otherpartitioncaninitiate anyactivity thatwill competewith thepartitionscheduledto accesstheresource.This meansthatnoCPUor memorycyclesmaybeconsumedotherthanat thebehestof thesoftwarein


thecurrentlyscheduledpartition.Thus,in particular, therecanbenoservicingof devicein-terruptsnorcycle-stealingDMA transfersotherthanthoseinitiatedby thecurrentpartition.Theserequirementscanbeviolatedin two ways:apreviouslyscheduledpartitionmayhavehadsomeI/O activity pendingwhen it wassuspended,or the externalenvironmentmaygenerateaninterruptspontaneously(e.g.,to indicatethatabuttonhasbeenpressed).

Draconianmeasuresseemnecessaryto prevent thesesourcesof temporaluncertainty.Externaleventseithershouldnot generateinterrupts(therelevantpartitionshouldpoll fortheeventinstead),or it shouldbepossibleto deferhandlingthemuntil therelevantpartitionis running(whetherthis is possibledependson the natureof the device andthe interruptandonhow selectively theCPUarchitectureallows interruptsto bemaskedoff). Similarly,interruptsdueto pendingI/O from a device commandedby a previouspartitionshouldbemasked off. If interruptscannotbe masked with sufficient selectivity, we could requirethe kernel to issuecommandsthat quiet the devices of the previous partition as part oftheprocessof suspendingthat partitionandstartingthenext. Alternatively, if devicesgoquietwhenuncommandedfor someshorttime, thekernelcouldmake thedevice registersunavailable(e.g.,by changingthe MMU table)during the final few millisecondsof eachpartition’s schedule.

Therestrictionsjustdescribedasnecessaryto ensurethattemporalcorrectnessof tasksin onepartitionareunaffectedby softwarein otherpartitionshave consequencesfor inter-partitioncommunications.With staticschedulingof partitions,ataskthatneedstheservicesof softwarein anotherpartition(e.g.,to accessa shareddevice) cannotsimply issuea pro-cedurecall. In fact,therecanbeno synchronousservices(i.e.,wherethecallerblocksandwaitsfor theserviceprovider to reply)acrosspartitionsbecause(a)onepartitionshouldnotdependon another(thatmaybefaulty) to unblockits progress,and(b) it would imposealargeperformancepenalty:thecallerwouldblock at leastuntil its next slot in thescheduleafter the serviceprovider’s slot. Instead,all inter-partition communicationmustbe asyn-chronous(wherethe callerplacesrequestsin the input buffers of tasksin otherpartitionsandcontinuesexecution;whennext activated,it looks in its own input buffers for replies,requests,andunsoliciteddatafrom otherpartitions). Becausefaulty softwarecould gen-erateanexcessive numberof requestsfor serviceby anotherpartition, it seemsnecessarythatfixedquotasshouldbe imposedon thenumberor rateof servicerequeststhatwill behonoredfrom eachpartition.

Someof therestrictionsthatarenecessarywhenpartitionsarescheduledstaticallymaypossiblyberelaxedwhenthey arescheduleddynamically. It makeslittle senseto schedulepartitionsdynamicallyandtasksstatically, andwhenboth partitionsandtasksaresched-uleddynamicallythereis little point in maintainingtwo levelsof scheduling,sotheunit ofschedulingwill actuallybethetask.However, theconcernfor temporalpartitioningwill in-fluencewhich tasksareeligible for execution.Whereasstaticschedulingensurestemporalpartitioningthroughstrictpreplanneddeterminism,dynamicschedulingrelieson theoremsfrom the mathematicalstudyof (for example)RMS. Thereare two problemsin apply-ing this theoryin thecontext of partitioning: oneis thata faulty partitionmayviolate the


assumptionsunderlyingthetheoremconcerned;theother(related)problemis thatthesim-plest (andtherefore,for life-critical applications,preferred)theoremsmake the strongestassumptions(e.g.,that context switchestake no time); whereas,thosewith morerealisticassumptionsreston moreelaborateandlesswell-establishedtheory. Both problemscanprobablybeovercomeby having thekernelandits schedulerenforcequotas.

For example,if schedulabilityof a tasksetis predicatedon a givenpartitiontakingnomorethan20% of the available time in eachcycle, thenthe kernelcansimply refusetomake any of its taskseligible for schedulingoncethat20%quotahasbeenreached.Theproblemwith this simpleschemeis that a faulty partitionmayconsumeits quotain verymany smallbursts(or a device maygenerateinterruptsat a rapidrate).Themany partitionswapsentailedtherebymayhave a moredeleteriouseffect on thetasksof otherpartitionsthan the CPU time directly consumedby the faulty task. A plausibleway to overcomethis problemis to subtractthe costof a partition swap (andthe performancedegradationcausedby disturbingthecaches)from thequotaof thetaskthatcausesit. Quotasmanagedin thiswayprovidemany of theguaranteesof staticschedulingwhile retainingsomeof theflexibility of dynamicscheduling.For example,sucha schemecould allow synchronousaswell asasynchronousinter-partitioncommunicationstogetherwith theability to serviceaperiodiceventsandinterrupts.(ModernoperatingsystemssuchasScoutusea somewhatsimilarapproachin whichaccountingfor resourceusageis performedonabstractionscalledpaths[102].) However, many of therestrictionsandconcernsdiscussedfor staticpartitionschedulingremainrelevant for dynamicscheduling:for example,it still seemsnecessaryto eliminatecycle-stealingDMA transfersandotherperformance-degradingactivities thatcannoteasilybecontrolledby quotas,andit is alsonecessaryto ensurethatinterruptsfor apartitionthathasexceededits quotaaremaskedor latchedattruly zerocost.Otherpotentialsourcesof cross-partitioninterferencesuchaslocksandsemaphoresmustalsobesuitablycontrolled(probablyby elimination).

Quota-baseddynamicschedulingmayprovide simpleguaranteesthatthetasksof non-faultypartitionsreceive theirexpectedallocations(i.e.,they receiveenoughtime),but guar-anteesthatthey will hit theirdeadlines(i.e., they getit at theright time)aremoreproblem-atical(thereare,for example,scenariosunderRMSwheretheearlycompletionof onetaskcausesanotherto missits deadline[85]). In practice,relatively few tasksmayneedto bescheduledwith greattemporalprecision: it is generallynecessaryto samplesensorsandcontrolactuatorswith very low jitter, but it doesnot greatlymatterwhenthecontrol lawsareevaluatedprovidedtheirresultsarereadywhenneeded.Thus,wecanenvisageaschemein which certaintasks(thoseassociatedwith sensorsandactuators)areguaranteedto exe-cutewith greattemporalaccuracy, while othersareguaranteedonly to get their allocationof resourcessometimeduring their period. To achieve this, the sensorandactuatortaskscould run in separateprocessorsthat arestaticallyscheduled(andcommunicatewith thedynamicallyscheduledcomputationaltasksthroughdual-portedmemory),or they couldrun at thehighestpriority in thedynamicallyscheduledsystem;justificationfor the latterschemewould requiredeepertheoremsthantheformer.


Whetherpartitionsandtasksarestaticallyor dynamicallyscheduled,the kernelmustcollaboratewith othersoftwareto providesomeof theservicesof anoperatingsystem—atthe very leastit will be necessaryto serviceinterrupts.Understaticpartitionscheduling,interruptsfrom externaldevicesareallowedonly whentheirpartitionis running;thismeansit is possibleto vectorinterruptsdirectlyto handlersin thepartitionratherthanhandlethemin thekernel.Theadvantageof theformerarrangementis thatit minimizesthecomplexityof thekernel; its difficulty is that interruptsareoftenvectoredin supervisormode,whichcanthreatenhardware-mediatedspatialpartitioning. Compromisearrangementshave thekernelfielding thehardwareinterrupt,but thenpassingit in a safeway to thepartitionforservice.Argumentsagainstdevicehandlingin apartitionarethatthis really is anoperatingsystemservicethat is betterdoneby an operatingsystem.A conventionaloperatingsys-temis unattractive in apartitionedenvironmentbecause,asportrayedin figure2.1(a),it is alargesharedresourcethatmustbeshown to respectpartitioningaswell asto befreeof otherfaults. A moresuitablearrangementprovidesoperatingsystemservicesseparatelywithineachpartition,asportrayedpreviously in figure2.1(b).Thisarrangementhastheadditionalbenefitthatdifferentpartitionscanusedifferentsetsof operatingsystemservices:for ex-ample(seefigure3.1),a critical functionmight usea minimal setof services(PartitionC),while a lesscritical but morecomplex functionmight employ somethingcloseto a COTSoperatingsystem(PartitionB), andadevice managementpartitionmight consistlargely ofstandardizedoperatingsystemservicesfor device management.Operatingsystemservicescannotaffect basicpartitioningin this arrangement;however, they mustbeusedwith greatcircumspectionin partitionsthatencapsulatea sharedserviceor resource(e.g.,a partitionthatprovidesa sharedfile system).Suchpartitionsarelogically anextensionof thekernelandmustbeshown to partitiontheir serviceor resourceappropriately—whichis likely tobemoredifficult themoresoftwarethey contain.

PartitionA

OSServicesA

Hardware

Kernel

OSServicesforDevicemanagement

OSServicesC

PartitionB

OSServicesB

PartitionC

DeviceManagementPartition

Figure3.1: DifferentOperatingSystemSoftwarefor DifferentPartitions

3.2. PartitioningAcrossaDistributedSystem 29

3.2 Partitioning Across a Distributed System

A distributedsystemresemblesour original Gold Standard—aseparateprocessorfor eachpartition—morecloselythana singlesharedprocessorandmight seemto raisefew newissueswith respectto partitioning: if we acceptthat the partitioning mechanismsem-ployed within individual processorsaresound,thenconnectingseveral suchsystemsto-gethersurelycannotdo any harm.This would betrueif we couldarrangededicatedphys-ical point-to-pointcommunicationsbetweenpartitionsin differentprocessors,but theonlyphysicalcommunicationsthatcanbeprovidedarebetweenprocessors. Thislimitationhasafairly significantimpact,which is compoundedwhenweconsidersharedcommunications,suchasbuses.

To startwith, supposewe wish to communicatedatafrom partition � � of processor�to partition � � in a differentprocessor� andthat we have a suitablecommunications

line from�

to � . Interruptswill begeneratedat � asthedatastartsto arrive and,aswediscoveredin theprevioussection,somecareis neededto ensurethat thesedo not disturbtemporalpartitioningin � . If � is dynamicallyscheduled,the quotaschemesdiscussedpreviously maybeall thatis needed,but matterscanbemorecomplicatedwhenpartitionsarescheduledstatically. Understaticscheduling,we mustrequireeitherthattheinterruptscanbe latchedat no costuntil the scheduledexecutionof partition � � (partitionsmustbescheduledat high-frequency to make it feasibleto servicecommunicationsin this way) orthat partition � � (or somedevice managementpartition that handlesthe communicationsline) is guaranteedto be executingwhenthe interruptsarrive. The latter clearly requiressynchronizationbetweenthepartitionschedulesof processors

�and � and,by extension

to otherprocessors,this impliesglobalsynchronizationof schedulesacrossall processors.Theonly wayto avoid theseconsequenceswhenstaticpartitionschedulingis employed

is to have a dataconcentratordevice at � that buffers incomingdatawithout imposingaloadontheCPUor its buses.Thepartition � � canthenretrieve incomingdatafrom thedataconcentratoraspart of its normally scheduledactivity. A moreaggressive designwouldallow the dataconcentratorto write incoming datadirectly into buffers associatedwitheachpartitionusingdual-portedRAM. Eventhesedesignsdonotnecessarilyeliminatetheneedfor globalsynchronization,however, becauseof theneedto control “babbling idiot”failuresin partitionsandprocessors.

Thesearefailureswherea transmittersendsdataconstantly—possiblyoverwhelmingits recipientor denying serviceto othertransmitters.Onescenariowould bea runaway inpartition � � thatcausesit to transmitto � � throughoutits scheduledexecution.We needtobesurethat this heavy loadon thecommunicationsline from

�doesnot affect theability

of the recipient( � or its dataconcentrator)to serviceits otherlines. This requireseithersomekind of quotaschemeat the recipientor a global schedulethat excludessimultane-oustransmissions.A babblingpartitioncando soonly during its scheduledexecution,soa global schedulemay be ableto ensurethat no two processorssimultaneouslyschedulepartitionsthat transmitto thesamerecipient.An alternative if � � doesnot drive thecom-


municationsline directly, but insteadsendsdatato a device managementpartition, is forthemanagementpartitionto imposeaquotaon thequantityof datathatit will acceptfromany onepartition. A babblingprocessoris anevenmoreseriousproblemthana babblingpartition;eithertherecipientmustbeableto toleratethefaultor it mustbepreventedat thetransmitter—mechanismsto do thisarediscussedin thecontext of buscommunications.

The measurespreviously discussedaddresstemporalpartitioning in inter-processorcommunications;we alsoneedto considerspatialpartitioning. The spatialdimensiontopartitioningrequiresmechanismsto ensurethatpartition � � of processor

�cansenddatato

partition � � in a differentprocessor� only if thatcommunicationis authorized.No addi-tional mechanismsarerequiredto ensurethis whena communicationline is dedicatedto aspecificinter-partitionchannel;additionalmechanismsareneeded,however, whenonelineis sharedamongmultiple receiving partitions.In this case,theaddressof the intendedre-cipientmustbeindicatedin eachtransmission.Thiscanbedoneexplicitly by includingtheaddressin thedatatransmittedor implicitly throughthetimeat which it is sent(thesched-ulesof thesendingandreceiving processorsmustbecoordinatedin this case).A concernwith explicit addressesis thatacommunicationsfaultcouldtransformadatumaddressedtopartition � � into oneaddressedto � � . This is afault-toleranceissueandis generallyhandledby checksumsor similar techniquesto ensuretheintegrity of transmitteddata.Therelatedpartitioningissueis theconcernthata fault in thesendingpartition � � couldcauseit to ad-dressdatadirectly to anunauthorizedrecipient� � —this fault will notbedetectedby checksums,sinceit occursoutsidetheirprotection.Theonly certainwayto containthisfault is tomediatethecommunicationwith sometrustedentitythathasindependentknowledgeof theauthorizedinter-partitioncommunications.This canbeperformedeitherat thetransmitter(e.g.,if adevice managementpartitionis usedto accessthecommunicationsline) or at thereceiver (e.g.,in adataconcentrator).A probabilisticmethodto containthefault is to allo-catepartitionaddressesrandomlyfrom avery largespace;thechancethata fault in � � willcauseit to manufacturethelegitimateaddress� � is thencorrespondinglysmall. In thecaseof implicit addresses,theconcernis thatby sendingdataat thewrongtime,thetransmittingpartitionwill causeit to bereceived by anunintendedrecipient. Mediationis requiredtocontainthis fault,which is consideredin moredetailin thecontext of buscommunications.

Somearchitecturesallow the componentsof a distributed systemto communicatewithout addingexplicit addressesto namethe intendedrecipient. In “publish-subscribe”architectures[82], for example, data is taggedwith a descriptionof its content (e.g.,air-data-samples) andrecipients“subscribe”to datacarryinggiven tags. Theseis-suesof namingandbindingwerediscussedearlierin thecontext of individual processors,andsimilar considerationsapplyhere,but with theaddedconcernfor fault tolerancewithrespectto communicationsfaults.

Usingseparatecommunicationslines to connecteachpair of processorsis expensive,sobusesaregenerallyusedin practice.A busis a departurefrom theGold Standard—itisa resourcesharedby all processorsandall partitions—andit is thereforecrucialto providepartitioningsothata fault in onepartitionor processorcannotaffect others.Thefaultsof


greatestconcernwith busesarethosewherea partitionor processoreitherbabblesor failsto follow theaccessprotocolin someway, sothatotherpartitionsor processorsaredeniedtimely accessto thebus.

A babblingor misbehaving partitioncannotinterferewith busaccessby otherpartitionsin its own processor(becauseapartitioncanaccessthebusonly whenit is scheduled),but itcaninterferewith accessbyotherprocessors(bycontendingfor thebusif thisis mediatedorby sendingtransmissionsthatcollidewith thoseof otherprocessorsif it is not),andit mayoverwhelmits receivers.A babblingor misbehaving processoris evenmoredisruptive thana babblingpartitionbecauseit is not constrainedby its own scheduleandcanmonopolizethe bus. Notice that processorfaultssuchas this arepartitioning—notfault-tolerance—issues,becausetheir consequenceswould not be so seriousif the buseswerenot shared.Dual or multiple busescanbe used,in the hopethat a babblerwill confineitself to justoneof them,but this cannotbeguaranteed.Theonly certainway to preventbabblingis tomediateeachprocessor’s accessto thebusby somecomponentthatwill fail independently.Thequestionthenis how doesthemediatorknow whatis alegitimatetransmissionandwhatis babbling?Theanswerdependsonwhethercommunicationsaretimeor eventtriggered.

In a time-triggeredsystem,transmissionsaredeterminedby a schedule,andthe me-diatingcomponentneedonly have anindependentcopy of its processor’s scheduleandanindependentclockin orderto determinewhetherits processorshouldbeallowedto transmiton thebus. The schedulesthatgovern time-triggeredtransmissionscanbeeitherlocal orglobal. A local scheduletreatseachprocessorindependentlyso that differentprocessorsmaycontendfor thebusandthereceiving partitionneednotbescheduledat thesametimeasthetransmitter. A globalschedule,on theotherhand,coordinatesall processorandbusactivity so that thereis no bus contention.Although it is perfectlyfeasibleto useglobalschedulingwith contentionbusessuchasEthernetor CAN (globalsynchronizationmeansthattheir ability to resolve contentionwill never beexercised,but thesystembenefitsfromthe low costandhigh performanceof the network interfacehardware),somespecializedbuseshave beendevelopedspecificallyto supportandexploit staticglobalscheduling.Ex-amplesincludethe ARINC 659 SAFEbusTM [2, 41] andthe Time TriggeredProtocolandits associatedArchitecture(TTP/TTA) [58]. With globalscheduling,thereis no realneedto includea destinationaddresswith thedata(becausethis is implicit in thetime themes-sageis sent)andsomeglobally scheduledbuses(e.g.,ARINC 659) do eliminateexplicitaddressestherebyreducingthenumberof bitsthatneedto becommunicatedandincreasingtheusefulcapacityof thebus.

Theclockof abusmediationcomponentneedstobeindependentof thatof its processor,but synchronizedwith it. With localscheduling,thepurposeof themediatingcomponentisto controlthepacingof busaccesses,but nottheirabsolutetiming and,for thispurpose,it isadequatefor themediatorandits processorto synchronizelocally (obviously, this mustbedonecarefullyto maintainplausibilityof theindependentfailureassumption).With globalscheduling,however, theclocksof all processorsandmediatorsmustbeglobally synchro-nized,and the mediatingcomponentsshouldperformthe synchronizationindependently


of their processors.If clock synchronizationis achievedby a high-level protocol,thenthemediatingcomponentsmustbecapableof interpretingthefull protocolhierarchy, andthisgreatlycomplicatestheirdesign.For this reason,themediatingcomponentsin TTA (calledbusguardians) do not performindependentclock synchronization,but take synchronizingsignalsfrom their hostprocessors[103]. This designpreventsbabbling,but a processorthatlosesclocksynchronizationwill take its busguardianwith it andwill still beableto ac-cessthebusat thewrongtime,thoughonly for shortperiods.However, theunsynchronizedprocessor/guardianpairwill alsobeunableto receivemessagescorrectly(becausesynchro-nizationis requiredto satisfytheCRCchecksoneachmessage),andtheguardianwill shutoff all busaccessafterfailing to receive asetnumberof expectedmessages.An alternativeapproachperformsclock synchronizationasa low-level protocolthatcanbeperformedbysimplemediatingcomponents.This approachseemsto requiresuitableelectricalproper-tiesof thebusandits drivers. In SAFEbus,for example,thesignalsfrom separatedriversareconnectedtogetheron the bus in a logical OR fashion,andthis allows a very simplesynchronizationprotocolthat is performeddirectly in themediatingcomponents(they arecalledBusInterfaceUnits in SAFEbus)[2].

Whereasglobally scheduledsystemsguaranteethat the bus will be free whena pro-cessoris scheduledto transmit,locally scheduledandevent-triggeredsystemsmustcopewith contentionbetweenprocessorsattemptingto transmiton the bus. In busesintendedfor control applications,contentionis not resolved probabilisticallyfollowing collisionsas it is in classicEthernet,but deterministicallyusingpreassignedslots(as in Echelon’sLON), a circulatingtoken (asin PROFIBUS [ProcessField Bus] [23]), or a priority arbi-trationscheme(asin CAN [ControllerAreaNetwork] [47]) to provide distributedmutualexclusionandtherebyprevent collisions. This determinismdoesnot provide very strongguaranteeson how long a processormustwait to accessthe bus, however. In CAN, forexample,a processorthat wishesto transmitmustfirst wait for any currenttransmissionto finish andthenit mustcontendwith any otherprocessorsthatalsowish to transmit. InCAN, the lowest-numberedprocessoralwayswins thearbitrationandmay thereforehaveto wait only aslong asthelongestmessagetransmission,while otherprocessorsalsohaveto wait while any lower-numberedprocessorsperformtheir transmissions.� It follows thatonly probabilisticguaranteescanbegivenonthebus-accessdelayin suchsystems,andthattheseguaranteeswill be quite weakin the presenceof faults[105], even if bus accessismediatedto controltheworstmanifestationsof babbling.

It is not straightforward to mediatea processor’s accessto thebuswhenthataccessiseventtriggered—thatis to say, triggeredby theprocessor’s internalcomputations,possibly�TheEchelonLON protocolhassimilar characteristics:stochasticflow controlis usedto reducethelikeli-

hoodof collisions;if acollisiondoesoccur, processorsbackoff andaccessthebusin orderof their “contentionslots.” Themainapplicationof theLON protocolis in automatingbuildings,wheretight real-timeguaranteesareunlikely to berequired,but theEchelonwebsitehttp://www.lonworks.echelon.com reportsthatRaytheonusesthis technologyin its Control-By-LightTM fault-tolerantfiber optic distributedcontrol system,which is currentlyundergoingFAA Part 25 certificationfor usein commercialaircraft;however, it seemsthatmechanismsin additionto theLON protocolareemployedin thisapplication.


basedondatait hasreceived—forthereis nowayto know whetheraneventhaslegitimatelyoccurredwithout independentlycopying thedatareceivedandreproducingthecomputationperformedby thatprocessor. A master-checker dual-processorarrangementsuchasthis isa very expensive way to prevent babbling. Redundantprocessorsareobviously requiredfor fault tolerancein IMA, but suchredundancy shouldbemanagedflexibly at thesystemlevel, not committedto pairing. Without master-checker pairs, the bestthat canbe doneto controlbabblingin anevent-triggeredsystemseemsto bethe impositionof somelimiton the rateat which a processormay transmiton thebus. TheARINC 629avionics databus[3] hasthiscapability(thebususestimeslotsto controlaccess,but it canbeusedin thelargercontext of anevent-triggeredsystem).

Becausethepurposeof partitioningis to controlfaultpropagation,someaspectsof par-titioning arevery closeto fault tolerance—forexample,thecontrolof babblingdiscussedin the previous paragraphshaselementsof both. Mechanismssuchas theseareneededto preserve the integrity of the serviceprovided by an IMA architectureto the avionicsfunctionsthatit supports.In addition,theavionicsfunctionsoftenneedto befault tolerantthemselves,andanIMA architecturemustthereforesupportthedevelopmentof suchfault-tolerantapplications.Thereis a choicein how muchfault toleranceshouldbeprovidedbytheIMA architecture,andhow muchby thefunctionsthemselves.Faultssuchasbabbling,which areoutsidethecontrolof any singlefunctionandthatcanhave system-wideramifi-cations,mustclearlybetoleratedby mechanismsof theIMA architecture.Sensorfailure,on theotherhand,seemsmorenaturallytheresponsibilityof thefunctionthatusesthesen-sor, while failureof a processorseemsto fall somewherein between—thedesignersof thefunctionmaybestknow how to handlesucha fault,but mayneedservicesprovidedby theIMA architectureto implementtheir strategy.

As mentionedin section2.1, thetrendtowardIMA runsin parallelwith anothertrendtoward developing avionics functionson top of a layer that provides standardoperatingsystemservicesand, possibly, additionalservicesto supportsystematicfault tolerance.Fault tolerancein critical systemsis usually basedon active redundancy; errorsare de-tectedor maskedthroughcomparisonor voting of theredundantlycomputedvalues.Faulttolerantarchitecturesdiffer in whetherthe redundantreplicasperform the sameor dif-ferent computationsand in whethertheir statesare synchronized(to allow exact-matchvoting). Someof the architecturalchoicesfor fault tolerancearestronglycontingentonotherchoices—forexample,thatbetweentime-andevent-triggeredarchitectures—thatarethemselvesstronglytied to choicesin partitioningmechanisms.Kopetzpresentspersua-sive argumentsthat time-triggeredarchitecturesare the bestchoicefor critical real-timeapplications[54–56] andthis choicealsofits well with therequirementsandmechanisms,discussedin theprevioussection,for ensuringtemporalpartitioningin adistributedsystem.


3.3 Summary

Thetopicsexaminedin thischaptershow thatpartitioninginteractsratherstronglywith sev-eralotherissuesin systemdesign:for example,scheduling,communication,distribution,andfault tolerance.By “interactswith” I meanthatdesignfreedomin thesedimensionsiscurtailedwhenpartitioningis a primarysystemgoal. This is not necessarilya badthing,however, becausetherestrictionsimposedby partitioningareexactly thosethatpreventun-expectedinteractionsamongsystemcomponentstherebypromotingcompositionality(i.e.,thepropertythatcomponentsthatwork ontheirown continueto dosowhencombinedwithothercomponents)andreducingintegrationcosts.

Becausepartitioningis critical to the safedeploymentof IMA, thedesignandimple-mentationof its mechanismsmustbe assuredto very high standards.Guidelinesfor theassuranceandcertificationof safety-criticalairbornesoftwarearespecifiedin thedocumentknown asDO-178Bin theUSA andED-12B in Europe[84]. Theseguidelinescall for avery rigorous—if traditional—processof reviews, analysis,anddocumentation;however,anappendixincludesformalmethodsamongthe“alternativemethods”that“may beusedinsatisfyingoneor moreof theobjectives”describedin thedocument.Theideabehindformalmethodsis to constructamathematicalmodelof asoftwareor systemdesignsothatcalcu-lationsbasedon themodelcanbeusedto predictpropertiesof theactualsystem—inmuchtheway thatfinite elementanalysisof a structuralmodelfor anairplanewing canbeusedto predictmechanicalpropertiesof the actualwing. Becausethe appropriatemathemati-cal domainfor modelingsoftwareis mathematicallogic, where“calculation” is performedby so-called“formal deduction”(asopposedto, say aerodynamics,wherethe appropri-atemathematicaldomainis partialdifferentialequations,andcalculationis performedbynumericalmethods),thisapproachis referredto asuseof “formal methods.”

The utility of calculation—as an adjunct to, or replacement for, physicalexperimentation—iswell understoodin other branchesof engineeringand is similar incomputerscience.In fact,its utility is potentiallygreaterin computersciencethanin otherengineeringdisciplinesbecausecomputersciencedealswith discreteor discontinuousphe-nomenawhereexperimentationandtestingareof limited valueasassurancemethods.Withdiscontinuoussystemstheremaybelittle relationshipbetweenthebehavior of thesystemin onecircumstanceandits behavior in another“similar” circumstance;consequently, ex-trapolationfrom testedto untestedcasesis of doubtfulvalidity. Thiscontrastswith physicalsystems,wherecontinuityjustifiessafeextrapolationfrom limited testcases.Formalmeth-odsaugmenttestingby allowing all the behaviors of a systemto be examined. Formalmethodsconsidera modelof the system,whereastestingexaminesthe real thing, so thetwo approachescomplementeachother. An elementarydescriptionof formal methods,andtheir applicationto thecertificationof avionics is presentedin [92], with moredetailavailablein [91].

In additionto their role in assurance,the modelsconstructedin formal methodscanoftenhelpclarify requirementsanddesignchoicesandcanleadto improvedunderstanding

3.3. Summary 35

of designproblems. They do this by abstractingaway all detail consideredirrelevant totheproblemat handandby formulatingtheremainingissueswith mathematicalprecision.Formalmodelsfor partitioningcouldthereforehelprefineour understandingof this topic.Now, partitioninghasmuchin commonwith certainissuesin computersecurity, andthoseissueshave beenthe target of considerableresearchin formal modelingextendingovermorethantwo decades.Thenext chapter, therefore,examinesissuesin computersecurityrelatedto partitioningandoutlinestheformalmodelingtechniquesthathave beentried.

36

Chapter 4

Comparison With ComputerSecurity

Computersecurityis relatedto partitioningin that both areconcernedwith the ability ofonesoftwareapplicationto influenceanother. Theconcernsarethatsensitive informationmight “leak” from onepartition to another(this is called informationflow in the securitycontext) or that doubtful informationmight contaminatehigh-qualityinformation(this iscalledinformationintegrity in thesecuritycontext) or thatonepartitionmight monopolizeor reducetimely accessto theCPUor someotherresource(this is calleddenialof servicein thesecuritycontext). Muchwork overmany years(see,for example,asurvey publishedin 1981[61]) hassoughtto provide a firm understandingof thesesecurityissuesandtheirenforcementmechanisms,andwe might hopeto applysomeof this work—or at leasttheunderlyingideas—topartitioning.In addition,researchin computersecurityhassoughttoprovide rigorous,formalapproachesto thespecificationandverificationof securesystems,andthereis hopethattheseapproachescouldcontributeto thedevelopmentof strongassur-ancetechniquesfor partitioningin avionics. The following sectionsreview thesesecurityissuesandtheformalmodelingtechniquesthathavebeenappliedto them.Thegoalhereisto explainthebasicideasandapproaches,sowemerelydescribetheformal techniquesthathave beenusedratherthanpresenttheactualformalism.

4.1 Data and Information Flow

Themoststudiedaspectof computersecurityis somethingof adualto oneof theconcernsof spatialpartitioning. In spatialpartitioning,a concernis that onepartition might writedatainto a secondandtherebydisrupt its operation. In securitywe aremoreconcernedwith thedatathatis written: if datain thefirst partitionis consideredhighly classified,thenwriting it into a morelowly classifiedpartition is tantamountto disclosingit. Reflectingthis concern,computersecuritygenerallyusesthemoreneutraltermprocessfor whatwas

37

38 Chapter4. ComparisonWith ComputerSecurity

calleda partitionin thepreviouschapter(indeed,thecomputersecuritynotionthatis clos-est to partitioningis called“processsecurity” [9]). Data flow securityis concernedwithcontrollingchannelsfor disclosure;informationflow securityis concernedwith moresub-tle channelsin which datais not written directly, but its informationcontentis disclosedjustaseffectively.

4.1.1 Access Control

A basicmechanismin enforcingbothpartitioningandinformationflow securityis calledaccesscontrol: the computersystemis assumedto have somemeans(typically, supervi-sor/usermodedistinctionsandmemorymanagementhardware)for limiting the primitiveresourcesthata processcanaccessandthewaysin which it canaccessthem. Thensomehigher-level resourcesaresynthesized(e.g.,a file system),andrulesgoverningaccesstothoseresourcesaredefinedandimplementedin termsof themoreprimitive resourcesandprotections.The rulesconstitutean accesscontrol policy. A familiar exampleis that oftheUnix file system:eachfile is associatedwith a particularownerandgroup,andwe canspecifyseparatelywhethertheowner, membersof thegroup,or otheruserscanread,write,or executethe file. This exampleraisestwo importanttopics in accesscontrol: the firstconcernsthechoiceandspecificationof theaccesscontrolpolicy thatis to beenforced,andthesecondconcernsthecompletenessof thatenforcement.

The Unix file systemprovidesa discretionaryaccesscontrol policy: userswho havereadaccessto a file can,at their discretion,copy it andgrantaccessto thecopy in any waythey choose.This maybe contraryto the intent of the original owneror to someorgani-zationalpolicy. To dealwith theseconcerns,variousmoreconstrainedkindsof mandatoryaccesscontrolpolicieshave beendefined.Thesimplestexampleis themultilevel securitypolicy that is intendedto reflectpracticesfor handlingclassifiedmilitary information.In amultilevel policy, every resourceandevery process(computersecurityusesthe termsob-jectandsubjectfor these)is givena labelfrom someorderedset(typically UNCLASSIFIED,CONFIDENTIAL, SECRET, andTOP SECRET), andasubjectmayhave readaccessto anob-jectonly if thesubject’s label(its clearance) is equalto or greaterthanthatof theobject(itsclassification). � This rule (it is calledthesimplesecurityproperty) doesnot stopa subjectfrom creatinga copy of anobjectat a lower classificationandtherebyviolating the intentof the policy, so it is augmentedby anotherrule calledthe � -property (pronounced“starproperty”) that saysthat a subjectmay have write accessto an objectonly if the object’slabelis equalto or greaterthanthatof thesubject.Thecombinationof thesimpleandthe �properties(i.e.,a subjectcanonly read“down” andwrite “up” in securitylevel) constitutethehistoricallysignificantBell andLa Padulasecuritypolicy [10]. Underfurtherexamina-tion, thispolicy raisesimportantquestionsthatwill beconsideredshortly. First, though,wereturnto therelatedquestionof completenessof anaccesscontrolpolicy.�Mattersarecomplicatedin practiceby the useof compartments(e.g., NATO, NOFORN) in combination

with thebasicclassificationsto createapartialordering.

4.1. DataandInformationFlow 39

Theaccesscontrolpolicy of theUnix file systemcanbebypassedif userscandirectlyreador write the contentsof the disk on which thosefiles arestored.Thus,althoughourinterestis in protectingfiles, we alsoneedto be concernedaboutthe disk, andpossiblyotherelementsof thesystemaswell. So the issueof completenessin accesscontrolcon-cernshow muchof thesystemneedsto beplacedunderaccesscontrol,andin whatway,for usto besurethat theresourcewe actuallywant to protectis, indeed,protectedagainstall possibleattacks.This issueis complicatedby thefact thatsecurityis really aboutpro-tectinginformation, not meredata,so thatany channel(a metaphoricalexamplewould beby tappingon thewalls in Morsecode)thatallows the informationcontentof a file to beconveyedto anunauthorizeduseris asdangerousastheability to copy afile directly.

Thepossiblechannelsfor informationflow canbequitesubtleandhardto detect(therewereat leasttwo in Bell andLa Padula’s “Multics Interpretation”[10]). For example,sup-posewehadaspecialUnix systemthatimposedtheBell andLaPadulapolicy onfile access,but with theadditionalpropertythatfile namesarerequiredto beuniqueacrossall users:anattemptto createafile with anexistingnamereturnsanerrorcode.Then,aSECRET processcanconvey informationto anUNCLASSIFIED oneby creatingfileswith prearrangednames:the UNCLASSIFIED processretrievesthe informationby checkingwhetheror not it is ableto createfiles with thosenames.This is anexampleof a “covert” channel;moreparticu-larly, it is a covert storage channel(becauseit exploits informationstoredin thedirectorystructureof the file system;the otherkind of channelusestiming information—seesec-tion 4.3) [60,65]. Thechannelis noisy(someother, innocent,processmight have createdfiles with thosenames),but codingtechniquesallow informationto betransmittedreliablyover noisy channels.Covert channelsareof concernfor two reasons:first, they canbeusedto transmitinformationatsurprisinglyhighbandwidth(oneearlydemonstrationdrovea teletypeat full speedusinga channelthat dependedon sensingwherea disk headwaspositioned[95]), andsecond,they areno differentin conceptfrom moreblatantchannels(e.g.,theunprotecteddisk) that leave a resourceopento directaccess(botharesymptomsof incompleteness)—sothatunlesswe have methodsof specificationandverificationthatareableto eliminatesubtlecovert channelswe have little guaranteethatwe caneliminateany channelsat all.

It might seemthat informationflow andcovert channelsareesotericsecurityconcernsand that only basicaccesscontrol is relevant to partitioning. However, while it is truethatcovert informationflow maybeof little concernfor partitioning(becauseit dependsoncollusionbetweensenderandreceiverandis thereforeimplausibleasafaultmanifestation),the mechanismsusedfor suchflow definitely areof concern.Consider, for example,theunique-file-namechanneldescribedabove. This servesasa channelfor informationflowbecauseonesubjectcanaffect thebehavior perceived by another(i.e., whetheror not theattemptto createa file returnsanerror),andthis is surelycontraryto the expectationsofpartitioning—foroneinterpretationof thoseexpectationsis thatthebehavior perceivedbysoftwarein any given partitionshouldbe independentof the actionsby softwarein otherpartitions. We might try to arrangefor this expectationto be satisfiedin the presenceof


the unique-file-namerestrictionby allocatingdisjoint namespacesto eachpartition. Butthena fault in the softwareof onepartition could causeit to createa file from another’snamespace—andtherebycausea subsequentfile creationin that otherpartition to fail.Thisexampleshows thatcovert channelsfor informationflow raiseissuesthatarerelevantto partitioningandthatexaminationof how securityhasdealtwith thesechannelsmaybeof usein partitioning.

Anotherpotentialproblemwith accesscontrolformulationsof securityis thatthey de-pendon informalunderstandingsof what“read” and“write” accessesreallymean.Wecanconstructperversesystemsin which thesetermsaregiven incorrect(e.g.,reversed)inter-pretationsandthatsatisfytheletterof anaccesscontrolpolicy while violatingits spirit [72].

Covert channelsandperverseinterpretationsareboth symptomsof the real problemwith accesscontrolaswe have usedit: it is a mechanismfor implementing, not an instru-mentfor specifying, securitypolicies.An adequatespecificationshouldgetat the“intent”thatunderliesa securitypolicy in a convincing manner. It shouldthenbepossibleto provethatan implementationin termsof accesscontrolcorrectlyenforcesthepolicy. Problemsof completeness,covert channels,andperverseinterpretationsshouldall beeliminatedbyasoundapproachof thiskind. Thenext sectionexaminessuchapproaches.

4.1.2 Noninterference

To repairtheproblemswith accesscontrol,we needto bemoreexplicit aboutour systemmodel:weneedto specifyhow asystemcomputesandinteractswith its environment,howinputsandoutputsareobserved,andhow subjectsandobjectsareidentified.Thenwe canspecifysecurityin termsof constraintson the observablebehavior of the systemwithoutneedingto describemechanismsto enforcethoseconstraints(althoughwe would hopetobeableto describesuchmechanismsata laterstageof developmentandto verify thattheyenforcethedesiredpolicy).

The mostsuccessfultreatmentsof this kind areall variationson a formulationcallednoninterferencethat wasintroducedby GoguenandMeseguerin 1992[33], althoughthekey ideawasadumbratedfive yearsearlier[30]. That key ideais that if thereis no flowof informationfrom onesecurityclassificationto another, thenthebehavior perceived bysubjectsof the second(“lower”) classificationshouldbe independentof any activity bysubjectsof the first (“higher”) classification.In particular, the behavior perceived by thesecondclassificationshouldbeunchangedif all activity by thefirst is removed.Theprecisedetailsdependon theformalmodelof computationemployed,but thetraditionaltreatmentusesa finite automatonasthesystemmodel: theautomatonchangesstateandproducesanoutputin responseto inputs,which arelabeledwith their securitylevel. A relation� � indicateswhetherlevel � is allowed to convey informationto or interfere with level ; itsnegationis thenoninterferencerelation !� , whichisconsideredaspecificationof thedesiredsecuritypolicy. A sequenceof inputs " is purged for level � by removing all inputsfromlevels that may not interferewith � ; this purgedinput sequenceis denoted"$#%� . Starting


from someinitial state &(' , the stateof the automatonafter consumingthe input sequence" is )+*-,.�/& '10 ".� , while that after consumingthe purged sequenceis )+*-,.�/& '10 ".#%�2� . Thenoninterferenceformulationof securitythenrequiresthat any level � input mustproducethesameoutputin both thesestates.The intuition is that this ensuresthatno experimentconductedat level � canreveal anything aboutthe presenceor absenceof earlier inputsfrom levelsthatshouldnot interferewith � .

Thenoninterferenceformulationof securityis statedin termsof asystem’s behavior inresponseto asequenceof inputs.An unwindingtheoremreducesthis to threeconditionsonits behavior with respectto individual inputs.Theseconditionsarestatedin termsof eachlevel’s “view” of thesystemstate(intuitively, if thesystemstateis thoughtof asconsistingof differentcomponents“belonging”to eachlevel, thenlevel � ’sview of thestatecomprisesits own componentandthecomponentsof all the levels thatareallowed to interferewith� ). If thelevel � views of two statesarethesame,we saythesestates“look thesameto � ”(technically, this is anequivalencerelationonstates).

Output Consistency: if two stateslook thesameto � , thena level � input mustproducethesameoutputin bothstates.

Step Consistency: if two stateslook thesameto � , thenthestatesthatresultfrom applyingthesameinput (of any level) to bothstatesmustalsolook thesameto � .

Local Respect (for � ): thesystemstatemustlook thesameto � beforeandafteraninputof a level thatis noninterferingwith � .

It is straightforward to prove that theseconditionsaresufficient to imply noninterference.Theproof is formalizedandmechanicallyverifiedin oneof thetutorialsfor thePVSveri-ficationsystem[94].

A connectionbetweenthenoninterferenceandaccesscontrolnotionsof securitycanbeestablishedby interpretingtheunwindingconditionsin accesscontrol terms.We supposethat the systemstateis a function from objectsto valuesandthateachobjecthasa level.Inputsof level � arereinterpretedasactionsperformedby a subjectof level � . Thenwesupposethataccesscontrolenforcesthefollowing ReferenceMonitor Assumptions.

3 Theoutputproducedby anactiondependsonly on thevaluesof objectsto which thesubjectperformingtheactionhasreadaccess.

3 If anactionchangesthevalueof any object,thenits new valuedependsonly on thevaluesof objectsto which theperformingsubjecthasreadaccess.

3 An actionmaychangethevaluesonly of objectsto whichtheperformingsubjecthaswrite access.

With theseassumptions,accesscontrolcanenforcetheunwindingconditionsby settingup thecontrolsasfollows (theseareessentiallytheBell andLa Padulaconditions).


1. If �4� , thenthe objectsto which subjectsof level � have readaccessmustbe asubsetof thoseto whichsubjectsof have readaccess,and

2. A subjectof level � mayhave write accessto anobjectfor which asubjectof level hasreadaccessonly if �5�6 .

The connectionbetweenthe two formulationsis establishedby interpretinga subject’s“view” as the valuesof all the objectsto which it has readaccess. A proof is givenin [90, section2.1]. The proof requiresformalizing the referencemonitor assumptions,which is surprisinglydifficult to do correctly(PopekandFarber[83], whofirst recognizedtheimportanceof theseconditions,madeerrorsin formalizingthem).

Contraryto early expectations(e.g., reference34), standardnoninterferencerequiresthe interferesrelation � to be transitive [90]. All suchtransitive relationsareequivalentto multilevel securitypolicies,andthetwo conditionson accesscontrolenumeratedin theprevious paragraphare likewise equivalent to the Bell andLa Padulapropertiesin thesecases[90, section3.1].

Becausethey imply apartialorderingonsecuritylevels,multilevel securitypoliciesdonot seemto capturethe concernsof partitioningall that closely, but intransitivepolicies(that is, thosewhere� is not requiredto betransitive) seemmorepromising.Intransitivepoliciescapturetheadditionalsecurityrestrictionsknown aschannelcontrol [88] or typeenforcement[13], which areconcernednot only with whetherinformationmayflow fromoneplaceto anotherbut with thepathsthroughwhichit mayflow. Channelcontrolsecuritypoliciescanberepresentedby directedgraphs,wherenodesrepresentsecuritydomainsandedgesindicatethedirectinformationflowsthatareallowed.Theparadigmaticexampleof achannel-controlproblemisacontrollerfor end-to-endencryption,asportrayedin figure4.1.

Plaintext messagesarriveat theRED sideof thecontroller;theirbodiesaresentthroughtheencryptiondevice (CRYPTO); their headers,whichmustremainin plaintext sothatnet-work switchescan interpretthem,aresentthroughthe BYPASS. Headersandencryptedbodiesarereassembledin theBLACK sideandsentoutontothenetwork. Thesecuritypol-icy we would like to specifyhereis therequirementthattheonly channelsfor informationflow from RED to BLACK mustbethosethroughtheCRYPTO andtheBYPASS (it is a sepa-rateproblemto specifywhatthosecomponentsmustdo). Noticethat theedgesindicatingallowedinformationflows in thisexamplearenot transitive: informationis allowedto flowfrom RED to BLACK via theCRYPTO andBYPASS, but mustnotdosodirectly.

Noninterferencecanbe extendedto intransitive policiesby substitutinga morecom-plicatedpurge function for the standardone. When �7!� , the usualrequirementis thatdeletingall actionsperformedby � shouldproducenochangein thebehavior of thesystemasperceived by . This is too strongif we alsohave the assertions�8� 9 and 9:� .Surelywe shouldonly deletethoseactionsof � that arenot followed by actionsof 9 (inthe CRYPTO example,RED, BLACK, andBYPASS or CRYPTO take the rolesof � 0 0 9 , re-spectively). This insight,anda definitionof thegeneralizedpurge function,weregivenbyHaighandYoung[37], togetherwith correspondingunwindingconditions.Unfortunately,


;;

<

;

=

;

Crypto

BlackRed

Bypass

Figure4.1: AllowedInformationFlows for anEncryptionController

oneof their unwindingconditionsis incorrect;correctconditionsweregiven,andformallyverified,by Rushby[90]. Theseunwindingconditionssimply replacethestepconsistencyconditionby aweakform.

Weak Step Consistency: if two stateslook thesameto � , andalsolook thesameto , thenthestatesthatresultfrom applyingthesameinput of level to bothstatesmustalsolook thesameto � .

Thecorrespondingconditionsfor accesscontrolenforcementconsistsimply of thesecondof thetwo conditionsgivenonpage41.�

Theformalstatementsof standardandintransitive noninterferenceuseanautomatonastheirformalsystemmodelandthereforeapplystraightforwardlyonly to asinglemonolithicsystem.To examinetheinteractionsof multiple,distributedsystems,moregeneralmodelsarerequired—forexample,transitionrelationsor processalgebras—andit is necessarytoadmit nondeterminism.Nondeterminismarisesnaturally in concurrentsystemsbecausethereis generallyno system-widecoordinationof the rateat which differentcomponentsproceed;hence,interactionscanoccurin differentordersin otherwiseidenticalruns,andthebehaviorsperceivedin thoserunscandivergemarkedly(thisis why it is sohardto debug�This might seemto suggestthatthefirst conditionon page41 is impliedby thesecondwhenthepolicy is

transitive. In fact,this is notnecessarilysofor agivensetof accesscontrols,but it will bepossibleto constructanotherset(i.e.,a differentassignmentof readandwrite permissions)thatwill satisfybothconditions.This isaconsequenceof the“nestingproperty”for transitivepolicies[90, Theorem5].


concurrentsystems).To accommodatethis system-level nondeterminism,noninterferencefor concurrentsystemsis formulatedto requirethatthesetof behaviors possiblein a givenscenariois unchangedat a given level wheninteractionsarepurgedin somesuitableway.Oneproblemwith thisformulationis thatit doesnotdefineapropertyin thetechnicalsense.

A systemcanbeidentifiedwith thesetof runsthat it canproduce(a run is a sequenceor “trace” of inputs,outputs,andothersignificantinteractions).A specificationis likewisea setof runs,anda systemsatisfiesa specificationif its runsarea subsetof thoseof thespecification.A setof runsis calleda property, sothatspecificationsandsystemscanbothbe consideredproperties.Specialclassesof propertiescalledsafetyand livenessplay animportantrole in formal methodsof analysis,andit canbeshown thatevery propertycanbeexpressedastheconjunctionof asafetyandalivenessproperty[5]. Security, however, isnota propertyin thissense:it is nota setof runs,but asetof setsof runs[73]. Thismeansthat standardmethodsfor deriving or verifying an implementationthat satisfiesa givenspecificationdonotwork for security—becausethesemethodsapplyonly to properties.

Anotherproblemwhennoninterferenceis extendedto concurrentsystemsin themannerjust describedis that it is not compositional: that is, two systemsindividually satisfyingsomenoninterferencepolicy canbe combinedto yield a compositesystemthat doesnotsatisfythatpolicy [71]. Many alternativeformulationsof noninterferencewereproposedforconcurrentsystemsin theattemptto overcomethisunattractive result.Unfortunately, thosethat were compositionalwere either very unintuitive (having no plausibleinterpretationasa naturalsecurityconcern)or wereexcessively restrictive (andunlikely to be satisfiedby practicalsystems).A partial resolutionwasprovided by Roscoe,who suggestedthatthe difficulty wasdueto a failure to appreciatethe significanceof nondeterminismwhencontemplatingsecurity[86].

Theproblemwith nondeterminismis thatit cansometimesberesolvedin awaythatde-pendsonunsecureinformationflow. A typicalexamplewouldbeasystemwith two levels,LOW andHIGH whereHIGH is requiredto benoninterferingwith LOW. Inputsto LOW causethe outputsodd or even to be generatednondeterministicallyunlesstherehave beenhighinputs,in whichcasetheLOW outputis oddor evenaccordingto theoddnessor evennessofthelastHIGH input(theHIGH inputsareassumedto bepositive integers).Thisexamplesat-isfiesmostdefinitionsof noninterferencefor concurrentsystemsbecausethesetof possiblebehaviors observableat the LOW level is unchangedby thepresenceor absenceof HIGH-level activity—yet it plainly violatesany reasonableinterpretationof “securesystem.” Theviolation is exposedwhenthesystemis composedwith onethatgeneratesonly evennum-berson theHIGH input. Roscoeexcludedsuchparadoxicalconstructionsby requiringtheircomponentsystemsto have behavior that is deterministicat eachsecuritylevel. Roscoe’sinsistenceon determinismalsosuggestsa resolutionto anotherdifficulty thathadplaguedmostearliertreatments:noninterferenceis not preservedunderrefinement.Refinementinthis (processalgebra)context meansa reductionin nondeterminism,andit posesthesamechallengeto noninterferenceascomposition.Roscoe’s treatmentis couchedin theformal-ismof CSP[39], whereaprocessis deterministicif it is freeof “divergence”andnever has


a choicebetween“accepting”and“refusing” anevent [87]. Therelationshipbetweenthistreatmentandtraditional interpretationsof determinismandsecurityin statemachinesisonethatrequiresclarification.

Thereis anothersenseof refinementfor which securityin general(not only its non-interferenceformulations)is not preserved. This is the notion of refinementin the senseof elaboration,wheremoremechanismsanddetailsareaddedto a specificationin orderto obtainan implementationthat is effectively executable.Under the standardnotion ofcorrectnessfor suchrefinements,it is necessaryonly to show that the propertiesof thespecificationareimplied by thoseof the implementation:the implementationis requiredto do at leastasmuchasthe specification,but it is not prohibitedfrom doing more. Animplementationof the specificationsuggestedby figure 4.1, for example,must provideat leastthe four communicationschannelsshown, but the standardnotion of correctre-finementwould not prevent it addinga direct communicationschannelbetweenRED andBLACK—despitethe fact that theabsenceof sucha channelis thewholepoint of the de-sign. For security, it is necessaryto constrainthenotionof correctrefinementso that theimplementationdoesnot addcapabilitiesthat areabsentin the specification.Clearly theimplementationmustcontainmoredetailsandmechanismsthanthe specification(elseitis surelynot an implementation),but for securerefinementthesemechanismsanddetailsmusthave no consequenceson the behavior that canbe perceived at the originally speci-fied interfaces.Theformalcharacterizationof this requirementis givenin termsof faithfulinterpretationsandis dueto Moriconi, Qian,Riemenschneider, andGong[75].

4.1.3 Separability

UsingRoscoe’s perspective, anadequatetreatmentfor distributedchannel-controlsecuritymight be achieved by taking the nondeterministiccompositionof deterministicsystems,eachcharacterizedby intransitive noninterference.Somearchitecturalrefinementto amoredetailedimplementationlevel could beobtainedusingfaithful interpretations,andthe re-strictionswithin eachsystemcouldthenbeenforcedby accesscontrol,usingthederivationfrom theunwindingconditionsdescribedearlier. (As farasI know, nobodyhasdeterminedwhethertheformaldetailsof thevariousmodelssupportthiscombination,norwhethersat-isfactorypropertiescanbederived for thecombination,but it seemsplausible.)However,the resultingmodelwould still be ratherabstractfor the purposeof deriving, for exam-ple,conditionsonhow asingleprocessorshouldimplementtheRED, BYPASS, andBLACK

componentsof figure4.1(theCRYPTO is usuallyanexternaldevice).An approachcalledseparability wasproposedfor this problemby Rushby[88]. The

idea is easiestto understandwhenno communicationsareallowed betweenthe separatecomponents.Thenthe ideais that the implementationshouldprovide theappearanceof aseparate,dedicatedprocessorto eachcomponent.Therealprocessoris timeshared,sothatit sometimesperformsinstructionson behalfof onecomponentandsometimeson behalfof another. The requirementsfor separabilitycan be expressedin termsof abstraction


functionsthatgive the“view” of theprocessorperceivedby eachcomponent.For example,if we have just two components,RED andBLACK, and >@? and >@A denotetheir respectiveabstractionfunctions,thentherequirementwhentheprocessoris executinginstructionsonbehalfof RED is thatthefollowing diagramshouldcommute.

< ;

;

<

> ?> ?

B � ?

B �Thatis to say, thestatechangein thephysicalprocessorcausedby executingtheinstructionop shouldbeconsistentwith executionof the “abstract”operationC+D ? on RED’s view oftheprocessor. At thesametime, BLACK ’s view of theprocessorshouldbeunchanged,asexpressedby thefollowing diagram.

;B �

>@A >@A

E F

BecauseI/O devicescandirectlyobserve andchangeaspectsof therealprocessor’s in-ternalstate(by readingandwriting its deviceregisters,for example),andcanalsoinfluenceits instructionsequencingmechanism(by raisinginterrupts),theactivity of thesedevicesis relevant to security. Consequently, we must imposeconditionson their behavior. Ex-pressedinformally (andonly from the RED component’s point of view), theseconditionsarethefollowing.

1. If > ? �HGI�KJL> ? �HMN� andactivity by a RED I/O device changesthe stateof the realprocessorfrom G to G2O , andthesameactivity wouldalsochangeit from M to M-O , then> ? �HG O �PJQ> ? �HM O � (i.e., statechangesin the RED view causedby RED I/O activitymustdependonly on theactivity itself andthepreviousstateof theRED view).

2. If activity by a non-RED I/O device changesthestateof therealprocessorfrom G toM , then >@?R�HGI�PJS>@?T�HMN� (i.e., non-RED I/O devicescannotchangethe stateof theRED view).

3. If > ? �HGI�UJV> ? �HMN� , thenany outputsproducedby RED I/O devicesmustbethesamein bothcases.

4. If > ? �HGI�WJX> ? �HMN� , thenthenext operationexecutedonbehalfof theRED componentmustalsobethesamein bothcases.

4.2. Integrity Policies 47

Separabilitywasproposedbeforeformal treatmentsof concurrentsystemshad beenfully developed,so the justificationof the above conditionspresentedin [89] is not fullysatisfactory. Furthermore,neitherthe informal nor the formal presentationdealswith al-lowedcommunicationschannelsbetweencomponents.Theproposalin [88] is to removethe mechanismsintendedto provide the desiredcommunicationschannelsand thenver-ify, usingthe conditionsabove, that the componentsof the resultingsystemareisolated.Jacob[48] notedthat this doesnot excludea particularkind of covert channel(calleda“legitimate”channel)thatpiggybacksundesiredclandestinecommunicationon thedesiredchannel.

A moremoderntreatment[93] derivestheconditionsfor separabilitywith communica-tions from thosefor intransitive noninterference.This treatmentweakensthe “triangular”commutative diagramof strict separabilityso that it appliesonly if RED !� BLACK (thisderivesfrom the“local respectfor !� ” unwindingcondition)and,whenRED � BLACK, itreplacesthe“rectangular”diagramby thefollowing condition(whichis basedonthe“weakstepconsistency” unwindingcondition).

> ? �HGI�UJV> ? �HMN��YZ> A �HGI�UJV> A �HMN�\[]> A �^C+DW�HGI�^�WJV> A �^C+DU�HMN�^�

Notice that this last conditiondoesnot usethe abstractoperationB � ? that appearsin the“rectangular”commutative diagram. This is becausewe do not really carewhat this op-erationis, only that > A �^C_DU�HGI�^� shouldbe functional in > A �HGI� , and the formula aboveexpressesthisdirectly.

4.2 Integrity Policies

The previous sectionshave consideredcomputersecuritynotionsrelatedto the undesireddisclosure of information.Therearesimilarnotionsrelatedto themodificationof informa-tion, wherethemainconcernis to ensureintegrity of theprotectedinformation.Integrity isrelatedto the“reliability” or “quality” of information:informationof high integrity shouldnot beallowedto becomecontaminatedby informationof low integrity. This requirementcanbe treatedasa strict dual to the Bell andLa Padulasecuritypolicy (that is, a subjectcanonly read“up” andwrite “down” in integrity level) andis known astheBiba integritypolicy [12].

Clark andWilson [15] arguedthattheintegrity of informationis alsoa functionof theoperationsthatareperformedonit andtheidentityof thosewhoinvoke thoseoperations.Ausershouldnotbeableto invokearbitraryoperationsonhigh-integrity information,but onlycertainwell-formedtransactions, andtheadmissibletransactionsmight bedeterminedbythestateof thedata,theidentityof theuser, andotherfactors.In commercialenvironments,the transactionsavailable to a userareoften governedby requirementsfor separation ofduties: auserwhoauthorizesapurchaseshouldnotbethesameastheonewhoselectsthevendor.


Othersimilarmodelsfor integrity have beenproposed,andtherehasbeenconsiderableinvestigationof whethertheseor theClark-Wilson modelcanbeenforcedby adaptationsof securitymechanismsdevelopedto controldisclosure[79].

4.3 Timing Channels and Denial of Service

Most work on formalizing securityhasfocusedon the dataand informationflow issuesdescribedin theprevioussections.In partitioningterms,theseall concernissuesin spatialpartitioning.Thereare,however, two topicsin computersecuritythatcorrespondto issuesin temporalpartitioning:timing channelsanddenialof service.

Timing channels(they were called “covert channels”when first identified [60]) aremechanismsfor clandestineinformationflow thatwork by modulatingthetimewhensomeeventsoccuror therateat which they occur. For example,aprocesscanchoosewhetherornot to give up its time sliceearly. If only two processesarerunning,theotherprocesscanusethetimeat which it receivescontrolto infer thechoicemadeby theotherprocess[60].More generally, the decisionsof a real-timeschedulercan be manipulatedto provide achannelfor informationflow [16]. Othertiming channelsmodulatetheloador contentionon somesystemresource(e.g.,the systembus [42]) or parametersaffecting performance(e.g.,thetime to seeka disk trackis affectedby whetherthepreviousseekwasto a nearbyor distanttrack[50]; thetimeto accessamemorypagewill beaffectedby whetheror not itwaspreviouslyswappedout to disk [95]).

Wherethey cannotberemoved,timing channelsaretypically renderedharmlesseitherby reducingcontentionor by introducingrandomnessinto the behavior of the resourcebeingmanipulated[36,68,104] or by reducingtheprecisionof thevarious“clocks” (e.g.,time-of-dayclocks, timers, instructionloops, asynchronousI/O performance)by whicha processcanmeasurethe passageof time [42]. Thesemeasuresdo not block a timingchannel,but they introducesufficientnoisethatits bandwidthis reducedto acceptablelevels(typically lessthan10bitspersecond).

Whereasthe concernsof partitioningandsecurityarequite closein the caseof stor-agechannels,they diverge for timing channels.Thevery existenceof a timing channelisunacceptablein a partitionedsystem,sinceit indicatesthat onepartition canchangethetemporalbehavior observedby another. Similarly, theremediesusedin securityto reducethebandwidthof timing channelsareworsethantheoriginalproblemfrom theperspectiveof partitioningbecausethey introducefurtherunpredictabilityinto systembehavior.

Formal analysisof pure timing channelsis generallybasedon information theory(e.g.,[76,77]), but thereis disputeoverwhethersomechannels(e.g.,thediskarmchannel)really aretiming channels,storagechannels,or a combinationof the two [114]. Conse-quently, formaldescriptionandanalysisof suchchannelsis difficult, andinformalmethodsaregenerallyemployed. As describedin Section3.1.2,staticpartitionschedulingrequiresimplementationchoices(strict determinism,no concurrentI/O) that eliminatethemecha-nismsthatcouldserve astiming channels.In systemsthatdonot requiresuchstrict tempo-

4.4. Applicationto Partitioning 49

ral partitioning,thetechniquesusedin computersecurityto identify timing channels[114]mighthelprevealunexpectedsourcesof temporalinterference.

Denialof servicecanbeseenasanextremetypeof timingchannel:theperceivedperfor-manceof someresourceis reducedto anunacceptablelevel, ratherthanmerelymodulated.In thelimit, theresourcemaybecomeunavailableto someprocesses.Thepossibilityof thislimiting caseis usuallyequivalentto theexistenceof astoragechannel.For example,if filespaceis sharedbetweentwo processes,thenonecandeny serviceto theotherby consumingall availablespace—but this is alsoachannelby whichoneprocesscanconvey informationto another(thereceiving channelattemptsto createafile: successis takenasa1 bit, failureasa0; thetransmittingprocessdeterminestheoutcomeby consumingandreleasingspace).Becausedenialof serviceis relatedto timing andstoragechannels,it canbepreventedbyenforcingstrictspatialandtemporalpartitioning.In general-purposesystems,thestrictnessof thesemechanismsmaybeconsideredundesirable:they wouldrequire,for example,fixedper-processallocationsof file space.Attemptsto provide flexible resourceallocationwith-out incurringtherisk of denialof servicerequire“useragreements”thatplacelimits on thedemandsthateachprocessmayplaceoneachresourceandthatareenforcedby a“resourceallocationmonitor” or “denialof serviceprotectionbase”[64,74] (thesearesomewhatsim-ilar to the quality of serviceideasusedin multimediasystems[102]). Formalizationsoftheseapproachesarestatedin termsof fair or maximumwaiting times[32,115].

Thesemore elaboratetreatmentsof denial of serviceare probablyunacceptableinstrictly partitionedsystemsbecausethey still allow the responseperceived by oneappli-cationto be influenced,even if not denied,by another. They mayalsobeunnecessaryinpartitionedsystemsbecausetherequirementsfor temporalpartitioningseemstrongerthanthosefor denialof service:thus,denialof serviceshouldautomaticallybeexcludedin anysystemthatprovidesstrict temporalpartitioning.Formal justificationfor this claim wouldbeaninterestingandworthwhileexercise.

4.4 Application to Partitioning

The formal models for computersecurity reviewed in the previous sectionsprovideseveral ideas that seemapplicableto partitioning. In particular, the central idea ofnoninterference—thatthe behavior perceived at onesecuritylevel shouldbe independentof actionsat higherlevels—canbereinterpretedin thecontext of partitioningandfault tol-eranceby supposingthatordinarybehavior shouldbeindependentof faults: that is, faultsareactionsinvoked by the environment,which is at a level that shouldbe noninterferingwith thelevel of ordinaryusers.This approachhasbeenexploredby Weberandby Simp-son[99,100,110,111]. It workswell asa specificationfor partitioningwhenthepartitionsarecompletelyisolated(in which caseit is equivalentto thestrict form of separability):ifwe have two partitionsA andB thatdo not communicatein any way, thensayingthat thebehavior of B mustbe independentof thatof A is a goodway to saythat faultsin A mustnot affect B. It works lesswell whenA hasto communicatewith B: noninterferencesays


only thatA interfereswith B anddoesnotdiscriminatebetweenlegitimateinterference(theknown communicationstream)andillegitimate(e.g.,changesto B’sprivatedata).

This exampleshows that theconcernsof securityare,in a certainsense,too coarsetocapturethoseof partitioning:securityis concernedonly with whetherinformationcanflowfrom A to B, not with how the flow canaffect B. Channelcontrol and its formalizationby intransitive noninterferencedoesallow the desireddiscrimination,but only at the costof introducinga third componentC to representthe buffer usedfor the intendedA to Bcommunicationstream. Using intransitive flows, we would specify

� � `Q� � and� !� � . This approachseemsto capturesomeof the concernsof partitioning,but theintroductionof thethird componentis artificial andunattractive.

A morefundamentalobjectionto the ideathat noninterferencecanserve asa modelfor partitioningis that partitioningis a safetyproperty(becauseviolationsof partitioningoccurat specificpointsin specificruns)whereasnoninterferenceis not evena “property”(recall page44). This suggeststhat noninterferenceis an unnecessarilysubtlenotion forpartitioning,andthatsomethingsimplershouldsuffice.

Thereis anothersensein which the concernsof securitydiverge from thoseof parti-tioning: securityassumesthatall componentsareuntrustworthy andthat themechanismsof securitymustbesetup sothatonly allowedinformationflowsoccur, nomatterhow thecomponentsbehave. In partitioning,however, we areconcernedonly with misbehavior byfaultypartitionsandarewilling to trustnonfaultycomponentsto safeguardtheirown inter-ests.For example,supposethattwo components

�and � arestaticallyscheduledandthat

eachbegins executionat a known entry point eachtime it is scheduled(this is the restartmodelof partitionswapping).Supposefurtherthateachhasanareaof “scratchpad”mem-ory thatis assumedto be“dirty” at thestartof eachexecution:thatis, thesoftwarein eachof A andB is verifiedto performits functionswith noassumptionsontheinitial contentsofthescratchpadmemory. Finally, supposethatA andB arerequiredto beisolatedfrom oneanother. ThenthescratchpadcanbesharedbetweenA andB underthepartitioninginter-pretationof isolation,but not underthecorrespondingsecurityinterpretation.Thereasonis that whenB receivescontrol, the scratchpadmay containdatawritten by A; underthesecurityinterpretationwemayassumenothingaboutevenanonfaulty B (in particular, thatit will not “peek” at thedataleft by A), andsothescratchpadis a channelfor informationflow from A to B in violationof theisolationsecuritypolicy. In thepartitionedsystem,weaccept(or specify)thata nonfaulty B doesnot do this,andour concernis to besurethatA(even if faulty) doesnot write outsideits own memoryor thescratchpad.Noticethat thisarrangementwould not be safein the restorationmodelof partitionswapping,becauseAcouldpreemptB, changeits scratchpad,andthenallow B to resume.

Theseexamplesdemonstratethat the concernsof partitioningandsecurity, althoughrelated,do not coincide. Thus,althoughformal treatmentsof partitioningmay possiblybe developedusingideasfrom computersecurity, they cannotbebaseddirectly on exist-ing securitymodels.Researchto develop formal modelsof partitioning,andto refinethedistinctionsbetweenpartitioningandsecurity, wouldbeilluminatingfor bothfields.

Chapter 5

Conclusion

We have reviewedsomeof themotivationfor integratedmodularavionicsandtherequire-mentfor partitioningin sucharchitectures.We thenconsideredmechanismsfor achievingpartitioning;the interactionsbetweenthesemechanismsandthosefor systemstructuring,scheduling,andfault tolerance;andissuesin providing assurancefor partitioning.Finally,we reviewedwork in computersecuritythathassimilarmotivationto partitioning.

Althoughpartitioningis averystrongrequirementandimposesmany restrictions,thereis asurprisinglywiderangeof architecturalchoicesthatcanachieve adequatepartitioning.Thespaceof thesedesignchoicesis seenmostclearlyin scheduling,wherebothstaticanddynamicschedulesseemableto combineflexibility with highly assuredpartitioning.

The strongestneedfor future work is to develop the narrative descriptiongiven hereintoamathematicalframework thatwill permitrigorousanalysisof architecturalchoicesforpartitionedsystemsandprovideastrongbasisfor theassuranceof individualdesigns.Thereis alreadysomesignificantwork in this direction[24,26,106,113], but greatopportunitiesremain,particularlywith respectto distributedsystemsandtemporalpartitioning. We areexaminingthesetopicsin currentwork andwill describeour resultsin a successorto thisreport.

51

52

References

1. ARINCSpecification651: DesignGuidancefor IntegratedModularAvionics. Aero-nauticalRadio,Inc,Annapolis,MD, November1991.Preparedby theAirlines Elec-tronicEngineeringCommittee.

2. ARINCSpecification659: BackplaneDataBus. AeronauticalRadio,Inc, Annapolis,MD, December1993.Preparedby theAirlines ElectronicEngineeringCommittee.

3. ARINCSpecification629: Multi-TransmitterData Bus; Part 1, Technical Descrip-tion (with fivesupplements);Part 2,ApplicationGuide(with onesupplement). Aero-nauticalRadio, Inc, Annapolis,MD, December1995/6. Preparedby the AirlinesElectronicEngineeringCommittee.

4. ARINCSpecification653: AvionicsApplicationSoftware Standard Interface. Aero-nauticalRadio,Inc, Annapolis,MD, January1997. Preparedby theAirlines Elec-tronicEngineeringCommittee.

5. B. Alpern andF. B. Schneider. Defining liveness.InformationProcessingLetters,21(4):181–185,October1985.

6. GregoryR. AndrewsandRichardP. Reitman.An axiomaticapproachto informationflow in programs. ACM Transactionson ProgrammingLanguages and Systems,2(1):56–76,January1980.

7. C. R.Attanasio,P. W. Markstein,andR.J.Phillips. Penetratinganoperatingsystem:A studyof VM/370 integrity. IBM SystemsJournal, 15(1):102–116,1976.

8. AlgirdasAvizienisandYutaoHe. Microprocessorentomology:A taxonomyof de-signfaultsin COTSmicroprocessors.In WeinstockandRushby[112], pages3–23.

9. HenryM. Ballard,David M. Bicksler, ThomasTaylor, andH. O. Lubbes.Ensuringprocesssecurityin theALS/N environment. In COMPASS’86 (Proceedingsof theFirst AnnualConferenceon ComputerAssurance), pages60–68,IEEE WashingtonSection,Washington,DC, July1986.

53

54 References

10. D. E.Bell andL. J.LaPadula.Securecomputersystem:UnifiedexpositionandMul-tics interpretation.TechnicalReportESD-TR-75-306,Mitre Corporation,Bedford,MA, March1976.

11. BrianBershad,StefanSavage,Przemyslaw Pardyak,EminGunSirer, David Becker,Marc Fiuczynski,CraigChambers,andSusanEggers.Extensibility, safetyandper-formancein theSPINoperatingsystem.In FifteenthACM SymposiumonOperatingSystemPrinciples, pages267–284,CopperMountain,CO, December1995. (ACMOperatingSystemsReview, Vol. 29,No. 5).

12. K. J. Biba. Integrity considerationsfor securecomputersystems.TechnicalReportMTR 3153,Mitre Corporation,Bedford,MA, June1975.

13. W. E.BoebertandR.Y. Kain. A practicalalternativetohierarchicalintegrity policies.In Proceedings8thDoD/NBSComputerSecurityInitiativeConference, pages18–27,Gaithersburg, MD, September1985.

14. Mikhail Chernyshov. Post-mortemonfailure. Nature, 339:9,May 4, 1989.

15. David D. Clark andDavid R. Wilson. A comparisonof commercialandmilitarycomputersecuritypolicies. In Proceedingsof theSymposiumon Securityand Pri-vacy, pages184–194,IEEEComputerSociety, Oakland,CA, April 1987.

16. RaymondK. Clark,DouglasM. Wells,Ira B. Greenberg, PeterK. Boucher, TeresaF.Lunt, PeterG. Neumann,andE. DouglasJensen.Effectsof multilevel securityonreal-timeapplications.In Proceedingsof theNinthAnnualComputerSecurityAppli-cationsConference, pages120–129,IEEE ComputerSociety, Orlando,FL, Decem-ber1993.

17. HenryS.F. CooperJr. Annalsof space(theplanetarycommunity)—part1: Phobos.New Yorker, pages50–84,June11,1990.

18. Henry S. F. CooperJr. TheEveningStar: VenusObserved. FarrarStrausGiroux,New York, NY, 1993.

19. RobertD. CulpandGeorgeBickley, editors.Proceedingsof theAnnualRocky Moun-tain Guidanceand Control Conference, Advancesin the Astronautical Sciences,Keystone,CO,February1993.AmericanAstronauticalSociety.

20. Sadegh Davari andLui Sha. Sourcesof unboundedpriority inversionsin real-timesystemsand a comparative study of possiblesolutions. ACM Operating SystemsReview, 26(2):110–120,April 1992.

21. D. E. DenningandP. J. Denning. Certificationof programsfor secureinformationflow. Communicationsof theACM, 20(7):504–513,July1977.

References 55

22. David L. Detlefs. An overview of the ExtendedStaticCheckingsystem. In FirstWorkshoponFormal Methodsin Software Practice(FMSP’96), pages1–9,Associ-ationfor ComputingMachinery, SanDiego,CA, January1996.

23. Profibus Standard: DIN 19245. DeutscheIndustrieNorm, Berlin, Germany. Twovolumes.

24. Ben L. Di Vito. A formal modelof partitioning for integratedmodularavionics.NASA ContractorReportCR-1998-208703,NASA Langley ResearchCenter, Au-gust1998.

25. EileenM. Dukes.Magellanattitudecontrolmissionoperations.In CulpandBickley[19], pages375–388.

26. BrunoDutertreandVictoriaStavridou. A modelof non-interferencefor integratingmixed-criticalitysoftwarecomponents.In WeinstockandRushby[112], pages301–316.

27. The interfacesbetweenflightcrews andmodernflight decksystems.Reportof theFAA humanfactorsteam, FederalAviation Administration, 1995. Available athttp://www.faa.gov/avr/afs/interfac.pdf.

28. SystemDesignandAnalysis. FederalAviation Administration,June21, 1988. Ad-visoryCircular25.1309-1A.

29. RTCAInc., DocumentRTCA/DO-178B. FederalAviation Administration,January11,1993.AdvisoryCircular20-115B.

30. R. J.Feiertag,K. N. Levitt, andL. Robinson.Proving multilevel securityof asystemdesign. In SixthACM Symposiumon Operating SystemPrinciples, pages57–65,November1977.

31. BryanFord,GodmarBack,Greg Benson,JayLepreau,Albert Lin, andOlin Shivers.TheFlux OSKit: A substratefor kernelandlanguageresearch.In SOSP-16[101],pages38–51.

32. Virgil D. Gligor. A noteon denial-of-servicein operatingsystems.IEEE Transac-tionsonSoftware Engineering, SE-10(3):320–324,May 1984.

33. J.A. GoguenandJ.Meseguer. Securitypoliciesandsecuritymodels.In Proceedingsof the Symposiumon Securityand Privacy, pages11–20,IEEE ComputerSociety,Oakland,CA, April 1982.

34. J. A. GoguenandJ. Meseguer. Inferencecontrol andunwinding. In Proceedingsof the Symposiumon Securityand Privacy, pages75–86,IEEE ComputerSociety,Oakland,CA, April 1984.

56 References

35. B. D. Gold,R. R. Linde,andP. F. Cudney. KVM/370 in retrospect.In Proceedingsof the Symposiumon Securityand Privacy, pages13–23,IEEE ComputerSociety,Oakland,CA, April 1984.

36. JamesW. Gray, III. Onintroducingnoiseinto thebus-contentionchannel.In SSP’93[45], pages90–98.

37. J. ThomasHaigh and William D. Young. Extendingthe noninterferenceversionof MLS for SAT. IEEE Transactionson Software Engineering, SE-13(2):141–150,February1987.

38. HermannHartig,MichaelHohmuth,JochenLiedtke,andSebastianSchonberg. Theperformanceof � -kernel-basedsystems.In SOSP-16[101], pages66–77.

39. C. A. R. Hoare. CommunicatingSequentialProcesses. PrenticeHall InternationalSeriesin ComputerScience.PrenticeHall, HemelHempstead,UK, 1985.

40. Harry Hopkins. Fit andforgetfly-by-wire. Flight International, pages89–92,De-cember3, 1988.

41. KennethHoymeandKevin Driscoll. SAFEbusTM. IEEE AerospaceandElectronicSystemsMagazine, 8(3):34–39,March1993.

42. Wei-Ming Hu. Reducingtiming channelswith fuzzy time. In SSP’91[44], pages8–20.

43. M. Huguet. The protectionof the processorstatusword of the PDP-11/60. ACMComputerArchitecture News, 10(4):27–30,June1982.

44. Proceedingsof the Symposiumon Securityand Privacy, Oakland,CA, May 1991.IEEEComputerSociety.

45. Proceedingsof the Symposiumon Securityand Privacy, Oakland,CA, May 1993.IEEEComputerSociety.

46. Fault Tolerant ComputingSymposium25: Highlightsfrom25 Years, Pasadena,CA,June1995.IEEEComputerSociety.

47. ISO Standard 11898: Road Vehicles—Interchange of Digital Information—Controller AreaNetwork(CAN)for High-SpeedCommunication. InternationalStan-dardsOrganization,Switzerland,November1993.

48. JeremyJacob. A noteon theuseof separabilityfor thedetectionof covert channels.Cipher(Newsletterof theIEEETechnicalCommitteeonSecurityandPrivacy), pages25–33,Summer1989.

References 57

49. M. FransKaashoek,Dawson R. Engler, Gregory R. Ganger, Hector M. Briceno,RussellHunt,David Mazieres,ThomasPinckney, RobertGrimm,JohnJannotti,andKennethMackenzie.Applicationperformanceandflexibility onExokernelsystems.In SOSP-16[101], pages52–65.

50. Paul A. Karger andJohnC. Wray. Storagechannelsin disk arm optimization. InSSP’91[44], pages52–61.

51. Rick KasudaandDonnaSexton Packard.Spacecraftfault tolerance:TheMagellanexperience.In CulpandBickley [19], pages249–267.

52. Philip J. Koopman,Jr. Perilsof the PC cache. EmbeddedSystemsProgramming,6(5):26–34,May 1993.

53. HermanKopetzand R. Nossal. Temporalfirewalls in large distributed real-timesystems.In 6th IEEE Workshopon Future Trendsin DistributedComputing, pages310–315,IEEEComputerSociety, Tunis,Tunisia,October1997.

54. HermannKopetz. Shouldresponsive systemsbeevent-triggeredor time-triggered?IEICE TransactionsonInformationandSystems, E76-D(11):1325–1332,November1993. Instituteof Electronics,Information,andCommunicationsEngineers,Japan.

55. HermannKopetz. Real-TimeSystems:DesignPrincplesfor DistributedEmbeddedApplications. TheKluwer InternationalSeriesin EngineeringandComputerScience.Kluwer, Dordrecht,TheNetherlands,1997.

56. HermannKopetz. A comparisonof CAN andTTP. Technicalreport,TechnischeUniversitat Wien,Vienna,Austria,March1998.

57. HermannKopetz.Thetime-triggered(TT) modelof computation.Technicalreport,TechnischeUniversitat Wien,Vienna,Austria,March1998.

58. HermannKopetzandGunterGrunsteidl.TTP—aprotocolfor fault-tolerantreal-timesystems.IEEEComputer, 27(1):14–23,January1994.

59. A. A. Lambregts. Automaticflight controls:Conceptsandmethods.Draft paperbyFAA NationalResourceSpecialistfor AdvancedControls,January1998.

60. B. W. Lampson.A noteon theconfinementproblem.Communicationsof theACM,16(10):613–615,October1973.

61. C. E. Landwehr. A survey of formalmodelsfor computersecurity. ACM ComputingSurveys, 13(3):247–278,September1981.

62. JohnP. Lehoczky, Lui Sha,andYeDing. Theratemonotonicschedulingalgorithm—exactcharacterizationandaveragecasebehavior. In RealTimeSystemsSymposium,pages166–171,IEEEComputerSociety, SantaMonica,CA, December1989.

58 References

63. K. RustanM. Leino andRajeev Joshi. A semanticapproachto secureinformationflow. TechnicalReport032,Digital SystemsResearchCenter, Palo Alto, CA, De-cember1997.

64. JussipekkaLeiwo andYuliangZheng. A methodto implementa denialof serviceprotectionbase. In Vijay Varadharajan,JosefPieprzyk,andYi Mu, editors,Infor-mationSecurityandPrivacy: SecondAustralasianConference(ACISP’97), Volume1270of Springer-VerlagLectureNotesin ComputerScience, pages90–101,Sydney,Australia,July1997.

65. S.B. Lipner. A commenton theconfinementproblem.In Fifth ACM SymposiumonOperatingSystemPrinciples. pages192–196,ACM, 1975.

66. C. L. Liu andJamesW. Layland.Schedulingalgorithmsfor multiprogrammingin ahard-real-timeenvironment.Journalof theACM, 20(1):46–61,January1973.

67. C. DouglassLocke. Softwarearchitecturefor hardreal-timeapplications:Cyclicexecutivesvs.priority executives.Real-TimeSystems, 4(1):37–53,March1992.

68. Keith Loepere. Resolvingcovert channelswithin a B2 classsecuresystem. ACMOperatingSystemsReview, 19(3):9–28,July1985.

69. Yoshifumi ManabeandShigemiAoyagi. A feasibility decisionalgorithmfor ratemonotonicanddeadlinemonotonicscheduling.Real-TimeSystems, 14(2):171–181,March1998.

70. R. A. Mayer andL. H. Seawright. A virtual machinetime sharingsystem. IBMSystemsJournal, 9(3):199–218,1970.

71. Daryl McCullough. Specificationsfor multi-level securityanda hook-upproperty.In Proceedingsof the Symposiumon Securityand Privacy, pages161–166,IEEEComputerSociety, Oakland,CA, April 1987.

72. JohnMcLean. A commenton the“basicsecuritytheorem”of Bell andLa Padula.InformationProcessingLetters, 20:67–70,1985.

73. JohnMcLean.A generaltheoryof compositionfor tracesetsclosedunderselectiveinterleaving functions. In Proceedingsof the Symposiumon Research in SecurityandPrivacy, pages79–93,IEEEComputerSociety, Oakland,CA, May 1994.

74. JonathanK. Millen. A resourceallocationmodelfor denialof service.In Proceed-ingsof theSymposiumon Research in SecurityandPrivacy, pages137–147,IEEEComputerSociety, Oakland,CA, May 1992.

References 59

75. Mark Moriconi,XiaoleiQian,R.A. Riemenschneider, andLi Gong.Securesoftwarearchitectures.In Proceedingsof theSymposiumonSecurityandPrivacy, pages84–93, IEEEComputerSociety, Oakland,CA, May 1997.

76. Ira S. Moskowitz, Steven J. Greenwald, andMyong H. Kang. An analysisof thetimedZ-channel.In Proceedingsof theSymposiumon SecurityandPrivacy, pages2–31,IEEEComputerSociety, Oakland,CA, May 1996.

77. Ira S. Moskowitz and Alan R. Miller. Simple timing channels. In Proceedingsof the Symposiumon Securityand Privacy, pages56–64,IEEE ComputerSociety,Oakland,CA, May 1994.

78. A Nadesakumar, R. M. Crowder, andC. J.Harris. Advancedsystemconceptsor fu-turecivil aircraft—anoverview of avionic architectures.Proceedingsof theInstitu-tion of MechanicalEngineers, Part G: Journalof AerospaceEngineering, 209:265–272,1995.

79. Integrity in AutomatedInformationSystems. NationalComputerSecurityCenter,September1991.TechnicalReport79-91.

80. GeorgeC. Necula. Proof-carryingcode. In 24thACM Symposiumon PrinciplesofProgrammingLanguages, pages106–119,Paris,France,January1997.

81. GeorgeC. NeculaandPeterLee. Safekernelextensionswithout run-timechecking.In 2nd Symposiumon Operating SystemsDesignand Implementation(OSDI ’96),pages229–243,Seattle,WA, October1996.

82. BrianOki, ManfredPfluegl, Alex Siegel,andDaleSkeen.TheInformationBus—anarchitecturefor extensibledistributedsystems.In FourteenthACM SymposiumonOperating SystemPrinciples, pages58–68,Asheville, NC, December1993. (ACMOperatingSystemsReview, Vol. 27,No. 5).

83. GeraldJ. PopekandDavid R. Farber. A modelfor verificationof datasecurityinoperatingsystems.Communicationsof theACM, 21(9):737–749,September1978.

84. DO-178B:Software Considerationsin Airborne Systemsand EquipmentCertifica-tion. RequirementsandTechnicalConceptsfor Aviation,Washington,DC, Decem-ber1992.Thisdocumentis known asEUROCAEED-12Bin Europe.

85. P. Richards.Timing propertiesof multiprocessorsystems.TechnicalReportTDB60-27,Tech.OperationsInc.,Burlington,MA, August1960.

86. A. W. Roscoe.CSPanddeterminismin securitymodelling. In Proceedingsof theSymposiumonSecurityandPrivacy, pages114–127,IEEEComputerSociety, Oak-land,CA, May 1995.

60 References

87. A. W. Roscoe,J.C. P. Woodcock,andL. Wulf. Non-interferencethroughdetermin-ism. Journalof ComputerSecurity, 4(1):27–53,1996.

88. JohnRushby. Thedesignandverificationof securesystems.In EighthACM Sympo-siumonOperating SystemPrinciples, pages12–21,Asilomar, CA, December1981.(ACM OperatingSystemsReview, Vol. 15,No. 5).

89. JohnRushby. Proofof Separability—Averificationtechniquefor a classof securitykernels. In Proc. 5th InternationalSymposiumon Programming, Volume 137 ofSpringer-Verlag Lecture Notesin ComputerScience, pages352–367,Turin, Italy,April 1982.

90. JohnRushby. Noninterference,transitivity, and channel-controlsecuritypolicies.TechnicalReportSRI-CSL-92-2,ComputerScienceLaboratory, SRI International,MenloPark,CA, December1992.

91. JohnRushby. Formalmethodsandthecertificationof critical systems.TechnicalRe-port SRI-CSL-93-7,ComputerScienceLaboratory, SRI International,Menlo Park,CA, December1993. Also issuedunderthe title Formal MethodsandDigital Sys-temsValidation for AirborneSystemsasNASA ContractorReport4551,December1993.

92. JohnRushby. Formalmethodsandtheir role in thecertificationof critical systems.TechnicalReportSRI-CSL-95-1,ComputerScienceLaboratory, SRI International,Menlo Park, CA, March 1995. Also availableasNASA ContractorReport4673,August1995,andissuedaspart of the FAA Digital SystemsValidation Handbook(theguidefor aircraftcertification).Reprintedin [97, pp.1–42].

93. JohnRushby. A foundationfor securitykernelverification. Technicalreport,Com-puterScienceLaboratory, SRI International,Menlo Park,CA, October1995. Infor-mal report.

94. JohnRushbyandDavid W. J. Stringer-Calvert. A lesselementarytutorial for thePVSspecificationandverificationsystem.TechnicalReportSRI-CSL-95-10,Com-puterScienceLaboratory, SRI International,Menlo Park, CA, June1995. Revised,July 1996.Available,with specificationfiles, at http://www.csl.sri.com/csl-95-10.html.

95. MarvinSchaefer, BarryGold,RichardLinde,andJohnScheid.Programconfinementin KVM/370. In ACM NationalConference, pages404–410,Seattle,WA, October1977.

96. Lui Sha,RagunathanRajkumar, andJohnP. Lehoczky. Priority inheritanceproto-cols: An approachto real-timesynchronization.IEEE Transactionson Computers,39(9):1175–1185, September1990.

References 61

97. RogerShaw, editor. SafetyandReliability of Software BasedSystems(TwelfthAn-nualCSRWorkshop), Bruges,Belgium,September1995.

98. O. Sibert,P. Porras,andR. Lindell. TheIntel 80x86processorarchitecture:Pitfallsfor securesystems.In Proceedingsof theSymposiumonSecurityandPrivacy, pages211–222,IEEEComputerSociety, Oakland,CA, May 1995.

99. Andrew Simpson,Jim Woodcock,andJim Davies. Safetythroughsecurity. In Pro-ceedingsof theNinth InternationalWorkshoponSoftware SpecificationandDesign,pages18–24,IEEEComputerSociety, Ise-Shima,Japan,April 1998.

100. Andrew Clive Simpson. Safetythrough Security. PhD thesis,Oxford UniversityComputingLaboratory, 1996. Availableathttp://www.comlab.ox.ac.uk/oucl/users/andrew.simpson/thesis.ps.gz.

101. SixteenthACM SymposiumonOperatingSystemPrinciples, Saint-Malo,France,Oc-tober1997.(ACM OperatingSystemsReview, Vol. 31,No. 5).

102. OliverSpatscheckandLarry Peterson.Defendingagainstdenialof serviceattacksinScout. In Proceedingsof the3rd UsenixSymposiumon Operating SystemsDesignandImplementation(OSDI), pages59–72,New Orleans,LA, February1999.

103. ChristopherTemple.Avoiding thebabbling-idiotfailurein a time-triggeredcommu-nicationsystem.In Fault Tolerant ComputingSymposium28, pages218–227,IEEEComputerSociety, Munich,Germany, June1998.

104. JonathanT. Throstle.Modelinga fuzzy timesystem.In SSP’93[45], pages82–89.

105. PauloVerıssimo,Jose Rufino,andLi Ming. How hardis hardreal-timecommuni-cationon field-buses?In Fault Tolerant ComputingSymposium27, pages112–121,IEEEComputerSociety, Seattle,WA, June1997.

106. Ben L. Di Vito. A model of cooperative noninterferencefor integratedmodularavionics. In WeinstockandRushby[112], pages269–286.

107. DennisVolpano,CynthiaIrvine,andGeoffrey Smith.A soundtypesystemfor secureflow analysis.Journalof ComputerSecurity, 4(2,3):167–187,1996.

108. JanVytopil, editor. Formal Techniquesin Real-Time and Fault-Tolerant Systems.Kluwer InternationalSeriesin EngineeringandComputerScience.Kluwer, Boston,Dordecht,London,1993.

109. RobertWahbe,StevenLucco,ThomasE. Anderson,andSusanL. Graham.Efficientsoftware-basedfault isolation. In FourteenthACM Symposiumon Operating Sys-temPrinciples, pages203–216,Asheville, NC, December1993. (ACM OperatingSystemsReview, Vol. 27,No. 5).

62 References

110. D. G. Weber. Formal specificationof fault-toleranceand its relation to computersecurity. In Proceedingsof the Fifth InternationalWorkshopon Software Specifi-cationandDesign, pages273–277,Pittsburgh, PA, May 1989. PublishedasACMSIGSOFTEngineeringNotes,Volume14,Number3.

111. DougG. Weber. Fault toleranceasself-similarity. In Vytopil [108], chapter2, pages33–49.

112. CharlesB. WeinstockandJohnRushby, editors.DependableComputingfor CriticalApplications—7, Volume12of IEEEComputerSocietyDependableComputingandFault TolerantSystems, SanJose,CA, January1999.

113. Matthew M. Wilding, David S.Hardin,andDavid A. Greve. Invariantperformance:A statementof taskisolationusefulfor embeddedapplicationintegration. In Wein-stockandRushby[112], pages287–300.

114. JohnC. Wray. An analysisof timing channels.In SSP’91[44], pages2–7.

115. Che-FnYu andVirgil D. Gligor. A specificationandverificationmethodfor thepre-ventionof denialof service.IEEETransactionsonSoftwareEngineering, 16(6):581–592,June1990.

Documents

Partitioning in Avionics Architectures: Requirements ... · PDF filePartitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance John Rushby Computer Science Laboratory