Varia%onsofVirtualMemory
CSE240AStudentPresenta%on
PaulLoriauxThursday,January21,2010
Everyuserprocessassigneditsownlinearaddressspace.
VM:RealandImagined
Eachaddressspaceasingleprotec%ondomainsharedbyallthreads.
Sharingonlypossibleatpagegranularity.
Disadvantage1:Pointermeaninglessoutsideitsaddresscontext
Disadvantage2:Transferofcontrolacrossprotec%ondomainsrequiresexpensivecontextswitch.
Inotherwords,sharingishardandslow.
Comparethisto“ideal”VMasimaginedyearsago.
Everyallocatedregiona“segment”withitsownprotec%oninforma%on.
However,thishassofarprovedtobeslowandcumbersome.Sofar...
Offersfinegrainedmemoryprotec%on,
EnterMondrianmemoryprotec%on(MMP)!
withthesimplicityandefficiencyoftoday’slinearaddressing,
withacceptablysmallrun‐%meoverheads.
How?By(A)allowingdifferentPDstotohavedifferentpermissionsonthesamememoryregion.
By(B)suppor%ngsharinggranularitysmallerthanapage.
Conven%onallinearVMsystemsfailon(A)and(B).
Page‐groupsystemsfailon(A)and(B).
Capability‐basedsystemsfailmainlyon(C),arguablyon(A).By(C)allowingPDstoownregionsof
memoryandgrantorrevokeprivileges.
1.APermissionsTable,oneperPDandstoredinprivilegedmemory,specifiesthepermissionsthatPDhasforeveryaddressintheaddressspace.
MMPDesign
2.Acontrolregisterholdstheaddressoftheac%vePD’spermissionstable.
1
2 3
3.APLBcachesentriesfrom(1)toreducememoryaccesses.
4
4.Asidecarregister,oneperaddressregister,cachesthelastsegmentaccessedbyitsassociatedregister.
Acompressedpermissionstablereducesspaceneededforpermissions.
Alinear,sortedarrayofsegments,permi%ngabinarysearchonPLBmiss.
Howtostorepermissions,take1:SST
Segmentscanbeanynumberofwordsinlength,butcannotoverlap.
SortedSegmentTable
Goal:balance(a)spaceoverhead,(b)access%meoverhead,(c)PLBu%liza%on,and(d)%metomodifythetableswhenpermissionschange.
EachentryintheSSTincludesa30‐bitstartaddressanda2‐bitpermissionsfield.
Problem:cans%lltakemanystepstolocateasegmentwhenthenumberofsegmentsislarge.
Problem:CanonlybesharedbetweenPDsinitsen%rety.
Amul%‐leveltable,sortoflikeaninode.
Howtostorepermissions,take2:MLPT
1024entries,eachofwhichmapsa4MBblock,inwhicheachentrymapsa4KBblock,inwhicheachofthe64entriesprovidesindividualpermissionsfor16x4Bwords.
Mul8‐levelPermissionsTable
Howarepermissionsstoredinthose4Bytewords?
Op%on1:PermissionVectorEntries
Op%on2:Mini‐SSTentries
Well,you’vegot32bits,youhave2‐bitpermissions,sojustchoptheentryupinto162‐bitvaluesforindica%ngthepermissionsforeachof16words.
PermissionVectorEntries
Problem:Donottakeadvantageofthefactthatmostusersegmentsarelongerthanasingleword.I.e.notcompact.
Twosegments(mid0,mid1)encodetwodifferentpermissionsfor16words.
Mini‐SSTEntries
Onesegment(first)encodespermissionsfor31‐wordsegment(maximally)upstream.
Onesegment(last)encodespermissionsfor32‐wordsegment(maximally)downstream.
Advantage:muchmorecompact
Advantage:overlapinsegmentsmayalleviateproximalloadsfromthetable
Totaladdressrange:79words
Disadvantage:overlappingaddressrangescomplicatestableupdates.
2‐bitentrytype.Eitherpointertonextlevel,pointertopermissionvector,ormini‐SSTentry.
ThePLBcachesPermissionsTableentries,analogoustotheTLB.
Boos%ngPerformancevia2‐LevelPermissionsCaching
Loworder“don’tcare”bitsinthePLBtagincreasethenumberofaddressesaPLBentrymatches,thusdecreasingthePLBmiss‐rate.
ChangesinpermissionsrequiresaPLBflush.Asabove,“don’tcare”bitsinthesearchkeyallowallPLBentrieswithinthemodifiedregiontobeinvalidatedduringasinglecycle.
Eachaddressregisterinthemachinehasanassociatedsidecarregister.
Boos%ngPerformancevia2‐LevelPermissionsCaching
OnaPLBmiss,theentryreturnedbythePermissionsTableisalsoloadedintotheappropriatesidecarregister.
Thebaseandboundoftheusersegmentrepresentedbythetableentryareexpandedtofacilitateboundarychecks.
Idea:thememoryaddressreferencedbyapar%cularaddressregisterontheCPUwillfrequentlyload/storefrom/tothataddressoronewithinthesameusersegment,sohardwirethepermissions.
ReducestraffictothePLB.
EvaluatedbothCandJavaprograms.(why?)thatwereamixofbothmemory‐referenceandmemory‐alloca%onintensive.
Evalua%ngPerformanceOverhead
Refs:totalno.ofloadsandstoresx106
Segs:no.ofsegmentswriientoPT
R/U:avg.referencesperPTupdate
Cs:no.ofcoarse‐grainedsegments
Oneconfoundingparameter:thedegreeofgranularity.Evaluatedtheextrema,(a)coarse‐grainedasprovidedbytoday’sVM,and(b)super‐fine‐grainedwhereeveryobjectisitsownusersegment.
AllbenchmarkprogramsrunonaMIPSsimulatormodifiedtotracememoryreferences.
Metrics
Run%meoverhead=numberofpermissionstablereferences(rw)÷numberofmemoryreferencesmadebytheapplica%on.
Spaceoverhead=spaceoccupiedbyprotec%ontables÷byspacebeingusedbyapplica%on(data+instruc%ons)atendofrun.
Spacebeingusedbyapplica%ondeterminedbyqueryingeverywordinmemoryandseeingifithasvalidpermissions.
Caveat:spacebetweenmallocedregionsnotincludedinthisquan%ty.
Caveat:notmeasuringpeakoverhead. Caveat:thisoverheadmayormaynotmanifestitselfasperformanceloss,dependingoncpuimplementa%on.
MLPTwithmini‐SSTentriesand60‐entryPLBversusconven%onalpagetableplusTLB.
Coarse‐GrainedProtec%onResults
Expecta%on:slightspaceoverheadfromMLPTleaftables.
Expecta%on:slightspeedimprovementfromaddi%onalhardware.
Claim:overheadforMMPword‐levelprotec%onisverylowwhennotused.
Expecta%onsgenerallyhold.
Fine‐GrainedProtec%onResults
Removedpermissionsonmallocheaderandonlyallowedprogramaccesstotheallocatedblock.
Claim1.MLPToutperformsSSTassegmentnumberincreases.Why?
Claim2.MLPTspaceoverheadisalways<9%.
Claim3.ThemSSTtableentryoutperformsprotec%onvectors.
MemoryHierarchyPerformance
Sidecarmissrateabout10‐20%.PLBmissratejust0.5%.
ImpactofpermissionstableaccessesonL1L2cacheefficiencyisslight,withlessthananaddi%onal0.25%beingaddedtothemissrateintheworstcase.
1.Fine‐grainedsegment‐basedmemoryprotec%onthatiscompa%blewithcurrentlinearlyaddressedISAsisfeasible.
Conclusions
2.Thespaceandrun%meoverheadofprovidingthisprotec%onissmallandscaleswiththedegreeofgranularity.
3.TheMMPfacili%escanbeusedtoimplementefficientapplica%ons.
64‐bitvirutaladdressspacesarecoming.
Context
Thisalleviatestheexis%ngevolu%onarypressureonOSestotreatvirtualaddressesasascarceresourcethatmustbemul%plyallocated.
Allprogramscannowliveinonebighappyaddressspace.Thesearesingleaddressspace(SAS)opera8ngsystems.
That’smoreaddressspacethanaprogramcouldeverwantorneed.
Pro:addressesareuniqueandcontextindependent.
Con:nomoreprivateaddressspacemeansnointrinsicprotec%on.
Thispaperfocusesonhowtorepresentprotec%oninforma%oninthecachestructuresinSASsystems.
ThePromisesofSASOSes
VirituallyIndexedCachesSupportforSharing
VAsaregloballyuniquesocanbepassedbetweendomainswithouttransla%on.
AlleviatestheneedforcostlyRPCswhencommunica%ngacrossprotec%ondomains.
Virituallyindexedcachesarefasterthanphysicallyindexedcachesbecausenoaddytransla%onrequired.
However,mul%pleaddressspaceOSesmustusephysicalindexingbecause:
2+VAsfrom2+PDsmayreferencethesamephysicaladdress(synonyms),causingcoherencyproblems.
1VAfrom2+PDsmayreference2+physicaladdress(homonyms).
Boththesemaybecircumvented,butatthecostofperformance.InSASsystems,synonymsandhomonymsdon’texist.Virtualtophysicalmappingis(canbe)1‐to‐1.
Mo%va%on
WewouldliketotakeadvantageofthebenefitsofSASOses.
Thispaperseekstoevaluatetwomodelofhardwaresupportforprotec%oninSASsystems.
Todosoweneedtorestoretheprotec%onthatwelostwhenwehadaseparateaddressspaceforeveryprotec%ondomain.
1.Protec%ondomainsinaSASsystemwouldtypicallyreferencesmallandwidelyscaieredpiecesoftheaddressspace.Linearpagetablescannotrepresentsuchsparsemappingscompactly.
What’swrongwithconven%onalarchitecturesandSAS?
2.Transla%onmappingsforsharedpagesmustbeduplicatedinthepagetablesofforeachdomain.Thisiswastefulandinvitescoherencyissues.
Twomodelsforsuppor%ngprotec%oninSASsystems
Page‐GroupModelDomain‐PageModel
Specifiespermissionsexplicitlyforeachdomain‐pagepair.
Defineslogicalgroupingofpagescalledpage‐groups.
APDdefinedbythesetofpage‐groupsitcanaccess.
CanbeimplementedbymovingPDtagsfromtheTLBtoaprotec8onlookasidebuffer(PLB).
Eachpagewithinagrouphasaccessrightsthatareusedbyalldomainswithaccesstothegroup.
ThePLB
EachPLBentrycontainstheprotec%oninforma%ongrantedtoonePDforonespecificvirtualpage.
OneachmemoryreferencethePLBisaccessedbytheVPNandPD‐ID,providedbyprocessorctrlregister.
NoteVAusedforbothcacheandPLB,solookupscanoccurinparallel.
Notethatsepara%onoftransla%onandprotec%oninthismannerallowsthePLBtobeusedinconjunc%onwithavirtuallyindexedandtaggedcache
ThePLB
Notethisisdifferentthanwhatwe’veseenbefore.Addresstransla%onisoutsidethecri%calpathofthecpu.
HeretheTLBcanbemovedoff‐chip,allowingforpoten%allyamuchlargerTLB.
NotetheTLBonlyrequiresoneentryforeachvirtual‐to‐physicalmapping.Apurgeisrequiredonlyonthechangeofavirtual‐to‐physicaltransla%onandnotduringaprotec%ondomainswitch.
ThePage‐GroupModel
TheprocessormustdeterminewhetherthecurrentPDhasaccesstothepage‐groupiden%fiedbytheAID
ThisTLBtakesaVPNandreturns(a)aphysicaladdress,(b)rights,and(c)anaccessiden8fier(AID)thatcontainsapage‐groupnumber.
Fourpage‐groupregisters(PIDs)storethesetofpage‐groupsaccissibletothecurrentPD.
IfAID==0(global)orAID==PID1‐4thenaccessisgranted,withrightsgivenby(a)theTLB,(b)thecurrentcpuprivilegelevel,and(c)awritebit.
a b c
ThePage‐GroupModel
Note2thefourpage‐groupregistersobviouslylimitthenumberofgroupsaPDcanaccess.Foreval,theauthorsassumeanLRUcacheofpage‐groups.
Note1ifaccessisnotgrantedthenanaccessviola%onissignaledandthekernelisinvoked.
Note3transla%onandprotec%onarecombinedinthisTLB,thustheTLBmustbeon‐chip.ButavirtuallyindexedTLBandon‐chipPLBcouldhavebeenusedaswell,thusmakingpage‐groupingabitofanorthologousissue.
Evalua%onA
Evalua%onB