7
EC 413 Computer Organization - Fall 2017 Problem Set 3 Important guidelines: Always state your assumptions and clearly explain your answers. Please upload your solution document (PDF or TXT) to Blackboard. Your submission should be a single file titled [your name]_pset3.pdf (example: JohnDoe_pset3.pdf) or [your name]_pset3.txt. 100/100 points possible Due Wednesday, November 1 st by 11:59 PM through Blackboard. Notes between the square brackets are simply to help you find the equivalent of the problem or question in the textbook. Part 1: CPU Datapath and Control Logic Figure 1: The basic implementation of the MIPS subset, including the necessary multiplexors and control lines. Exercise 1: [4.1] Consider the following instruction: Instruction: AND Rd,Rs,Rt Interpretation: Reg[Rd] = Reg[Rs] AND Reg[Rt] Data PC Address Instruction Instruction memory Registers ALU Address Data Data memory Add Add 4 MemWrite MemRead M u x M u x M u x Control RegWrite Zero Branch ALU operation Register # Register # Register #

EC 413 Computer Organization Fall 2017 Problem Set 3ascslab.org/courses/ec413/psets/pset3.pdf · EC 413 Computer Organization - Fall 2017 Problem Set 3 Important guidelines: Always

Embed Size (px)

Citation preview

Page 1: EC 413 Computer Organization Fall 2017 Problem Set 3ascslab.org/courses/ec413/psets/pset3.pdf · EC 413 Computer Organization - Fall 2017 Problem Set 3 Important guidelines: Always

EC413ComputerOrganization-Fall2017ProblemSet3

Importantguidelines:Alwaysstateyourassumptionsandclearlyexplainyouranswers.Pleaseuploadyoursolutiondocument(PDForTXT)toBlackboard.Yoursubmissionshouldbeasinglefiletitled[yourname]_pset3.pdf(example:JohnDoe_pset3.pdf)or[yourname]_pset3.txt.100/100pointspossible–DueWednesday,November1stby11:59PMthroughBlackboard.Notesbetweenthesquarebracketsaresimplytohelpyoufindtheequivalentoftheproblemorquestioninthetextbook.Part1:CPUDatapathandControlLogic

Figure1:ThebasicimplementationoftheMIPSsubset,includingthenecessarymultiplexorsandcontrollines.

Exercise1:[4.1]Considerthefollowinginstruction:Instruction:AND Rd,Rs,RtInterpretation:Reg[Rd] = Reg[Rs] AND Reg[Rt]

Data

PC Address Instruction

Instructionmemory

Registers ALU Address

Data

Datamemory

AddAdd

4

MemWrite

MemRead

Mux

Mux

Mux

Control

RegWrite

Zero

Branch

ALU operation

Register #

Register #

Register #

Page 2: EC 413 Computer Organization Fall 2017 Problem Set 3ascslab.org/courses/ec413/psets/pset3.pdf · EC 413 Computer Organization - Fall 2017 Problem Set 3 Important guidelines: Always

a.WhatarethevaluesofcontrolsignalsgeneratedbythecontrolinFigure1fortheaboveinstruction?b.Whichresources(blocks)performausefulfunctionforthisinstruction?c.Whichresources(blocks)produceoutputs,buttheiroutputsarenotusedforthisinstruction?Whichresourcesproducenooutputsforthisinstruction?Exercise2:[4.2]Thebasicsingle-cycleMIPSimplementationinFigure1canonlyimplementsomeinstructions.NewinstructionscanbeaddedtoanexistingInstructionSetArchitecture(ISA),butthedecisionwhetherornottodothatdepends,amongotherthings,onthecostandcomplexitytheproposedadditionintroducesintotheprocessordatapathandcontrol.Thefirstthreeproblemsinthisexerciserefertothenewinstruction:Instruction:LWI Rt,Rd(Rs)Interpretation:Reg[Rt] = Mem[Reg[Rd]+Reg[Rs]]a.Whichexistingblocks(ifany)canbeusedforthisinstruction?b.Whichnewfunctionalblocks(ifany)doweneedforthisinstruction?c.Whatnewsignalsdoweneed(ifany)fromthecontrolunittosupportthisinstruction?Exercise3:[4.3]Whenprocessordesignersconsiderapossibleimprovementtotheprocessordatapath,thedecisionusuallydependsonthecost/performancetrade-off.Inthefollowingthreeproblems,assumethatwearestartingwithadatapathfromFigure1,whereI-Mem,Add,Mux,ALU,Regs,D-Mem,andControlblockshavelatenciesof400ps,100ps,30ps,120ps,200ps,350ps,and100ps,respectively,andcostsof1000,30,10,100,200,2000,and500,respectively.ConsidertheadditionofamultipliertotheALU.Thisadditionwilladd300pstothelatencyoftheALUandwilladdacostof600totheALU.Theresultwillbe5%fewerinstructionsexecutedsincewewillnolongerneedtoemulatetheMULinstruction.a.Whatistheclockcycletimewithandwithoutthisimprovement?b.Whatisthespeedupachievedbyaddingthisimprovement?c.Comparethecost/performanceratiowithandwithoutthisimprovement.

Page 3: EC 413 Computer Organization Fall 2017 Problem Set 3ascslab.org/courses/ec413/psets/pset3.pdf · EC 413 Computer Organization - Fall 2017 Problem Set 3 Important guidelines: Always

Part2:InstructionSupportConsiderationsinCPUDesign

Figure2:Aportionofthedatapathusedforfetchinginstructionsandincrementingtheprogramcounter.

Figure3:ThesimpledatapathforthecoreMIPSarchitecturecombinestheelementsrequiredbydifferentinstructionclasses.

Exercise1:[4.4]4.4 Problemsinthisexerciseassumethatlogicblocksneededtoimplementaprocessor’sdatapathhavethefollowinglatencies:I-Mem Add Mux ALU Regs D-Mem Sign-

Extend Shift-Left-2

200ps 70ps 20ps 90ps 90ps 250ps 15ps 10ps

PC Readaddress

Instruction

Instructionmemory

Add

4

Readregister 1

Writedata

Registers ALU

Add

Zero

RegWrite

MemRead

MemWrite

PCSrc

MemtoReg

Readdata 1

Readdata 2

ALU operation4

Sign-extend

16 32

Instruction ALUresult

Add

ALUresult

Mux

Mux

Mux

ALUSrc

Address

Datamemory

Readdata

Shiftleft 2

4

Readaddress

Instructionmemory

PC

Readregister 2

Writeregister

Writedata

Page 4: EC 413 Computer Organization Fall 2017 Problem Set 3ascslab.org/courses/ec413/psets/pset3.pdf · EC 413 Computer Organization - Fall 2017 Problem Set 3 Important guidelines: Always

a.Iftheonlythingweneedtodoinaprocessorisfetchconsecutiveinstructions(Figure2),whatwouldthecycletimebe?b.ConsideradatapathsimilartotheoneinFigure3,butforaprocessorthatonlyhasonetypeofinstruction:unconditionalPC-relativebranch.Whatwouldthecycletimebeforthisdatapath?c.ConsideradatapathsimilartotheoneinFigure3,butforaprocessorthatonlyhasonetypeofinstruction:conditionalPC-relativebranch.Whatwouldthecycletimebeforthisdatapath?Questiond,eandfrefertothedatapathelementShift-left-2:d.Whichkindsofinstructionsrequirethisresource?e.Forwhichkindsofinstructions(ifany)isthisresourceonthecriticalpath?f.Assumingthatweonlysupportbeqandaddinstructions,discusshowchangesinthegivenlatencyofthisresourceaffectthecycletimeoftheprocessor.Assumethatthelatenciesofotherresourcesdonotchange.Exercise2:[4.5]Fortheproblemsinthisexercise,assumethattherearenopipelinestallsandthatthebreakdownofexecutedinstructionsisasfollows:add addi not beq lw sw

20% 20% 0% 25% 25% 10% a.Inwhatfractionofallcyclesisthedatamemoryused?b.Inwhatfractionofallcyclesistheinputofthesign-extendcircuitneeded?Whatisthiscircuitdoingincyclesinwhichitsinputisnotneeded?

Page 5: EC 413 Computer Organization Fall 2017 Problem Set 3ascslab.org/courses/ec413/psets/pset3.pdf · EC 413 Computer Organization - Fall 2017 Problem Set 3 Important guidelines: Always

Part3:EfficiencyConsiderationsinCPUDesignExercise1:[4.10]4.10 Inthisexercise,weexaminehowresourcehazards,controlhazards,andInstructionSetArchitecture(ISA)designcanaffectpipelinedexecution.ProblemsinthisexerciserefertothefollowingfragmentofMIPScode:sw r16,12(r6) lw r16,8(r6) beq r5,r4,Label # Assume r5!=r4 add r5,r1,r4 slt r5,r15,r4 Assumethatindividualpipelinestageshavethefollowinglatencies:

IF ID EX MEM WB

200ps 120ps 150ps 190ps 100ps

a.Forthisproblem,assumethatallbranchesareperfectlypredicted(thiseliminatesallcontrolhazards)andthatnodelayslotsareused.Ifweonlyhaveonememory(forbothinstructionsanddata),thereisastructuralhazardeverytimeweneedtofetchaninstructioninthesamecycleinwhichanotherinstructionaccessesdata.Toguaranteeforwardprogress,thishazardmustalwaysberesolvedinfavoroftheinstructionthataccessesdata.Whatisthetotalexecutiontimeofthisinstructionsequenceinthe5-stagepipelinethatonlyhasonememory?Wehaveseenthatdatahazardscanbeeliminatedbyaddingnopstothecode.Canyoudothesamewiththisstructuralhazard?Why?b.Forthisproblem,assumethatallbranchesareperfectlypredicted(thiseliminatesallcontrolhazards)andthatnodelayslotsareused.Ifwechangeload/storeinstructionstousearegister(withoutanoffset)astheaddress,theseinstructionsnolongerneedtousetheALU.Asaresult,MEMandEXstagescanbeoverlappedandthepipelinehasonly4stages.ChangethiscodetoaccommodatethischangedISA.Assumingthischangedoesnotaffectclockcycletime,whatspeedupisachievedinthisinstructionsequence?c.Assumingstall-on-branchandnodelayslots,whatspeedupisachievedonthiscodeifbranchoutcomesaredeterminedintheIDstage,relativetotheexecutionwherebranchoutcomesaredeterminedintheEXstage?d.Giventhesepipelinestagelatencies,repeatthespeedupcalculationfromb.,buttakeintoaccountthe(possible)changeinclockcycletime.WhenEXandMEMaredoneinasinglestage,mostoftheirworkcanbedoneinparallel.Asaresult,the

Page 6: EC 413 Computer Organization Fall 2017 Problem Set 3ascslab.org/courses/ec413/psets/pset3.pdf · EC 413 Computer Organization - Fall 2017 Problem Set 3 Important guidelines: Always

resultingEX/MEMstagehasalatencythatisthelargeroftheoriginaltwo,plus20psneededfortheworkthatcouldnotbedoneinparallel.e.Giventhesepipelinestagelatencies,repeatthespeedupcalculationfromc.,takingintoaccountthe(possible)changeinclockcycletime.AssumethatthelatencyIDstageincreasesby50%andthelatencyoftheEXstagedecreasesby10pswhenbranchoutcomeresolutionismovedfromEXtoID.f.Assumingstall-on-branchandnodelayslots,whatisthenewclockcycletimeandexecutiontimeofthisinstructionsequenceifbeqaddresscomputationismovedtotheMEMstage?Whatisthespeedupfromthischange?AssumethatthelatencyoftheEXstageisreducedby20psandthelatencyoftheMEMstageisunchangedwhenbranchoutcomeresolutionismovedfromEXtoMEM.Exercise2:[4.19]Thisexerciseexploresenergyefficiencyanditsrelationshipwithperformance.ProblemsinthisexerciseassumethefollowingenergyconsumptionforactivityinInstructionmemory,Registers,andDatamemory.Youcanassumethattheothercomponentsofthedatapathspendanegligibleamountofenergy.

I-Mem 1 Register Read

Register Write

D-Mem Read

D-Mem Write

140pJ 70pJ 60pJ 140pJ 120pJ

Assumethatcomponentsinthedatapathhavethefollowinglatencies.Youcanassumethattheothercomponentsofthedatapathhavenegligiblelatencies.

I-Mem Control Register Read or Write

ALU D-Mem Read or Write

200ps 150ps 90ps 90ps 250ps

a.HowmuchenergyisspenttoexecuteanADDinstructioninasingle-cycledesignandinthe5-stagepipelineddesign?b.Whatistheworst-caseMIPSinstructionintermsofenergyconsumption,andwhatistheenergyspenttoexecuteit?c.Ifenergyreductionisparamount,howwouldyouchangethepipelineddesign?WhatisthepercentagereductionintheenergyspentbyanLWinstructionafterthischange?d.Whatistheperformanceimpactofyourchangesfromc.?

Page 7: EC 413 Computer Organization Fall 2017 Problem Set 3ascslab.org/courses/ec413/psets/pset3.pdf · EC 413 Computer Organization - Fall 2017 Problem Set 3 Important guidelines: Always

e.WecaneliminatetheMemReadcontrolsignalandhavethedatamemorybereadineverycycle,i.e.,wecanpermanentlyhaveMemRead=1.Explainwhytheprocessorstillfunctionscorrectlyafterthischange.Whatistheeffectofthischangeonclockfrequencyandenergyconsumption?f.Ifanidleunitspends10%ofthepoweritwouldspendifitwereactive,whatistheenergyspentbytheinstructionmemoryineachcycle?Whatpercentageoftheoverallenergyspentbytheinstructionmemorydoesthisidleenergyrepresent?