Upload
hoangtuyen
View
260
Download
1
Embed Size (px)
Citation preview
EC413ComputerOrganization-Fall2017ProblemSet3
Importantguidelines:Alwaysstateyourassumptionsandclearlyexplainyouranswers.Pleaseuploadyoursolutiondocument(PDForTXT)toBlackboard.Yoursubmissionshouldbeasinglefiletitled[yourname]_pset3.pdf(example:JohnDoe_pset3.pdf)or[yourname]_pset3.txt.100/100pointspossible–DueWednesday,November1stby11:59PMthroughBlackboard.Notesbetweenthesquarebracketsaresimplytohelpyoufindtheequivalentoftheproblemorquestioninthetextbook.Part1:CPUDatapathandControlLogic
Figure1:ThebasicimplementationoftheMIPSsubset,includingthenecessarymultiplexorsandcontrollines.
Exercise1:[4.1]Considerthefollowinginstruction:Instruction:AND Rd,Rs,RtInterpretation:Reg[Rd] = Reg[Rs] AND Reg[Rt]
Data
PC Address Instruction
Instructionmemory
Registers ALU Address
Data
Datamemory
AddAdd
4
MemWrite
MemRead
Mux
Mux
Mux
Control
RegWrite
Zero
Branch
ALU operation
Register #
Register #
Register #
a.WhatarethevaluesofcontrolsignalsgeneratedbythecontrolinFigure1fortheaboveinstruction?b.Whichresources(blocks)performausefulfunctionforthisinstruction?c.Whichresources(blocks)produceoutputs,buttheiroutputsarenotusedforthisinstruction?Whichresourcesproducenooutputsforthisinstruction?Exercise2:[4.2]Thebasicsingle-cycleMIPSimplementationinFigure1canonlyimplementsomeinstructions.NewinstructionscanbeaddedtoanexistingInstructionSetArchitecture(ISA),butthedecisionwhetherornottodothatdepends,amongotherthings,onthecostandcomplexitytheproposedadditionintroducesintotheprocessordatapathandcontrol.Thefirstthreeproblemsinthisexerciserefertothenewinstruction:Instruction:LWI Rt,Rd(Rs)Interpretation:Reg[Rt] = Mem[Reg[Rd]+Reg[Rs]]a.Whichexistingblocks(ifany)canbeusedforthisinstruction?b.Whichnewfunctionalblocks(ifany)doweneedforthisinstruction?c.Whatnewsignalsdoweneed(ifany)fromthecontrolunittosupportthisinstruction?Exercise3:[4.3]Whenprocessordesignersconsiderapossibleimprovementtotheprocessordatapath,thedecisionusuallydependsonthecost/performancetrade-off.Inthefollowingthreeproblems,assumethatwearestartingwithadatapathfromFigure1,whereI-Mem,Add,Mux,ALU,Regs,D-Mem,andControlblockshavelatenciesof400ps,100ps,30ps,120ps,200ps,350ps,and100ps,respectively,andcostsof1000,30,10,100,200,2000,and500,respectively.ConsidertheadditionofamultipliertotheALU.Thisadditionwilladd300pstothelatencyoftheALUandwilladdacostof600totheALU.Theresultwillbe5%fewerinstructionsexecutedsincewewillnolongerneedtoemulatetheMULinstruction.a.Whatistheclockcycletimewithandwithoutthisimprovement?b.Whatisthespeedupachievedbyaddingthisimprovement?c.Comparethecost/performanceratiowithandwithoutthisimprovement.
Part2:InstructionSupportConsiderationsinCPUDesign
Figure2:Aportionofthedatapathusedforfetchinginstructionsandincrementingtheprogramcounter.
Figure3:ThesimpledatapathforthecoreMIPSarchitecturecombinestheelementsrequiredbydifferentinstructionclasses.
Exercise1:[4.4]4.4 Problemsinthisexerciseassumethatlogicblocksneededtoimplementaprocessor’sdatapathhavethefollowinglatencies:I-Mem Add Mux ALU Regs D-Mem Sign-
Extend Shift-Left-2
200ps 70ps 20ps 90ps 90ps 250ps 15ps 10ps
PC Readaddress
Instruction
Instructionmemory
Add
4
Readregister 1
Writedata
Registers ALU
Add
Zero
RegWrite
MemRead
MemWrite
PCSrc
MemtoReg
Readdata 1
Readdata 2
ALU operation4
Sign-extend
16 32
Instruction ALUresult
Add
ALUresult
Mux
Mux
Mux
ALUSrc
Address
Datamemory
Readdata
Shiftleft 2
4
Readaddress
Instructionmemory
PC
Readregister 2
Writeregister
Writedata
a.Iftheonlythingweneedtodoinaprocessorisfetchconsecutiveinstructions(Figure2),whatwouldthecycletimebe?b.ConsideradatapathsimilartotheoneinFigure3,butforaprocessorthatonlyhasonetypeofinstruction:unconditionalPC-relativebranch.Whatwouldthecycletimebeforthisdatapath?c.ConsideradatapathsimilartotheoneinFigure3,butforaprocessorthatonlyhasonetypeofinstruction:conditionalPC-relativebranch.Whatwouldthecycletimebeforthisdatapath?Questiond,eandfrefertothedatapathelementShift-left-2:d.Whichkindsofinstructionsrequirethisresource?e.Forwhichkindsofinstructions(ifany)isthisresourceonthecriticalpath?f.Assumingthatweonlysupportbeqandaddinstructions,discusshowchangesinthegivenlatencyofthisresourceaffectthecycletimeoftheprocessor.Assumethatthelatenciesofotherresourcesdonotchange.Exercise2:[4.5]Fortheproblemsinthisexercise,assumethattherearenopipelinestallsandthatthebreakdownofexecutedinstructionsisasfollows:add addi not beq lw sw
20% 20% 0% 25% 25% 10% a.Inwhatfractionofallcyclesisthedatamemoryused?b.Inwhatfractionofallcyclesistheinputofthesign-extendcircuitneeded?Whatisthiscircuitdoingincyclesinwhichitsinputisnotneeded?
Part3:EfficiencyConsiderationsinCPUDesignExercise1:[4.10]4.10 Inthisexercise,weexaminehowresourcehazards,controlhazards,andInstructionSetArchitecture(ISA)designcanaffectpipelinedexecution.ProblemsinthisexerciserefertothefollowingfragmentofMIPScode:sw r16,12(r6) lw r16,8(r6) beq r5,r4,Label # Assume r5!=r4 add r5,r1,r4 slt r5,r15,r4 Assumethatindividualpipelinestageshavethefollowinglatencies:
IF ID EX MEM WB
200ps 120ps 150ps 190ps 100ps
a.Forthisproblem,assumethatallbranchesareperfectlypredicted(thiseliminatesallcontrolhazards)andthatnodelayslotsareused.Ifweonlyhaveonememory(forbothinstructionsanddata),thereisastructuralhazardeverytimeweneedtofetchaninstructioninthesamecycleinwhichanotherinstructionaccessesdata.Toguaranteeforwardprogress,thishazardmustalwaysberesolvedinfavoroftheinstructionthataccessesdata.Whatisthetotalexecutiontimeofthisinstructionsequenceinthe5-stagepipelinethatonlyhasonememory?Wehaveseenthatdatahazardscanbeeliminatedbyaddingnopstothecode.Canyoudothesamewiththisstructuralhazard?Why?b.Forthisproblem,assumethatallbranchesareperfectlypredicted(thiseliminatesallcontrolhazards)andthatnodelayslotsareused.Ifwechangeload/storeinstructionstousearegister(withoutanoffset)astheaddress,theseinstructionsnolongerneedtousetheALU.Asaresult,MEMandEXstagescanbeoverlappedandthepipelinehasonly4stages.ChangethiscodetoaccommodatethischangedISA.Assumingthischangedoesnotaffectclockcycletime,whatspeedupisachievedinthisinstructionsequence?c.Assumingstall-on-branchandnodelayslots,whatspeedupisachievedonthiscodeifbranchoutcomesaredeterminedintheIDstage,relativetotheexecutionwherebranchoutcomesaredeterminedintheEXstage?d.Giventhesepipelinestagelatencies,repeatthespeedupcalculationfromb.,buttakeintoaccountthe(possible)changeinclockcycletime.WhenEXandMEMaredoneinasinglestage,mostoftheirworkcanbedoneinparallel.Asaresult,the
resultingEX/MEMstagehasalatencythatisthelargeroftheoriginaltwo,plus20psneededfortheworkthatcouldnotbedoneinparallel.e.Giventhesepipelinestagelatencies,repeatthespeedupcalculationfromc.,takingintoaccountthe(possible)changeinclockcycletime.AssumethatthelatencyIDstageincreasesby50%andthelatencyoftheEXstagedecreasesby10pswhenbranchoutcomeresolutionismovedfromEXtoID.f.Assumingstall-on-branchandnodelayslots,whatisthenewclockcycletimeandexecutiontimeofthisinstructionsequenceifbeqaddresscomputationismovedtotheMEMstage?Whatisthespeedupfromthischange?AssumethatthelatencyoftheEXstageisreducedby20psandthelatencyoftheMEMstageisunchangedwhenbranchoutcomeresolutionismovedfromEXtoMEM.Exercise2:[4.19]Thisexerciseexploresenergyefficiencyanditsrelationshipwithperformance.ProblemsinthisexerciseassumethefollowingenergyconsumptionforactivityinInstructionmemory,Registers,andDatamemory.Youcanassumethattheothercomponentsofthedatapathspendanegligibleamountofenergy.
I-Mem 1 Register Read
Register Write
D-Mem Read
D-Mem Write
140pJ 70pJ 60pJ 140pJ 120pJ
Assumethatcomponentsinthedatapathhavethefollowinglatencies.Youcanassumethattheothercomponentsofthedatapathhavenegligiblelatencies.
I-Mem Control Register Read or Write
ALU D-Mem Read or Write
200ps 150ps 90ps 90ps 250ps
a.HowmuchenergyisspenttoexecuteanADDinstructioninasingle-cycledesignandinthe5-stagepipelineddesign?b.Whatistheworst-caseMIPSinstructionintermsofenergyconsumption,andwhatistheenergyspenttoexecuteit?c.Ifenergyreductionisparamount,howwouldyouchangethepipelineddesign?WhatisthepercentagereductionintheenergyspentbyanLWinstructionafterthischange?d.Whatistheperformanceimpactofyourchangesfromc.?
e.WecaneliminatetheMemReadcontrolsignalandhavethedatamemorybereadineverycycle,i.e.,wecanpermanentlyhaveMemRead=1.Explainwhytheprocessorstillfunctionscorrectlyafterthischange.Whatistheeffectofthischangeonclockfrequencyandenergyconsumption?f.Ifanidleunitspends10%ofthepoweritwouldspendifitwereactive,whatistheenergyspentbytheinstructionmemoryineachcycle?Whatpercentageoftheoverallenergyspentbytheinstructionmemorydoesthisidleenergyrepresent?