Datorarkitektur 1 & Datorsystem 1 – föreläsning 10 onsdag 14 november 2007

  • Upload
    kitty

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Foto: Rona Proudfoot ( some rights reserved ). Datorarkitektur 1 & Datorsystem 1 – föreläsning 10 onsdag 14 november 2007. Datorarkitektur 1 & Datorsystem 1 – föreläsning 10 onsdag 14 november 2007. 1977. 1982. 1982. 1982. 1986. 1986. 42 MB Disk 1.4 MB Floppy 1 MB RAM. 1991. 1996. - PowerPoint PPT Presentation

Citation preview

  • Datorarkitektur 1 & Datorsystem 1 frelsning 10 onsdag 14 november 2007Datorarkitektur 1 & Datorsystem 1 frelsning 10 onsdag 14 november 2007Foto: Rona Proudfoot (some rights reserved)

  • 1977

  • 1982

  • 1982

  • 1982

  • 1986

  • 1986

  • 1990

  • 199142 MB Disk

    1.4 MB Floppy

    1 MB RAM

  • 1996

  • 2002

  • 2007

  • Vad bestmmer om ett program krs snabbt eller lngsamt?

  • Hur stort programmet r... dvs antal rader kod (LOC)...Antal instruktioner...kompilatorHur ofta processorn kan utfra en uppgift clock cycle time...Clock cycles per instruction (CPI) ...Beror p hrdvaran!

  • Clock CyclesInstead of reporting execution time in seconds, we often use cycles Clock ticks indicate when to start activities (one abstraction):

    clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec) cycle time = seconds per cycleHow long is the cycle time for a 4GHz processor??Hur ka prestanda, dvs minska sec/prog?

  • So, to improve performance (everything else being equal) you caneither (increase or decrease?)

    ________ the # of required cycles for a program, or ________ the clock cycle time or, said another way, ________ the clock rate. decreasedecreaseincrease

  • CPU time = Instruction_count x CPI x clock_cycle_time or

    The clock rate is usually given in the documentationCan measure instruction count by using profilers/simulators without knowing all of the implementation detailsCPI varies by instruction type and ISA implementation for which we must know the implementation detailsClock Cycles per InstructionCan measure the CPU execution time by running the program. These equations separate the three key factors that affect performancecycle_time = 1/clock_rate

    clock_rate = 1/cycle_time

  • How many cycles are required for a program?Could assume that number of cycles equals number of instructions r detta antagande korrekt?time

  • Different numbers of cycles for different instructionsMultiplication takes more time than additionFloating point operations take longer than integer onesAccessing memory takes more time than accessing registers

    timeChanging the cycle time often changes the number of cycles required for various instructions

  • FetchPC = PC+4DecodeExecOur implementation of the MIPS is simplified

    Memory-reference instructions: lw, sw

    Arithmetic-logical instructions: add, sub, and, or, slt

    Control flow instructions: beq, jGeneric implementation

    1 Use the program counter (PC) to supply the instruction address and fetch the instruction from memory (and update the PC)

    2 Decode the instruction (and read registers)

    3 Execute the instruction (possibly write registers)

  • Stateelement1Stateelement3CombinationalLogic2clockone clock cycleWhen can signals be read and written?An edge-triggered methodology1read contents of state elements 2send values through combinational logic3write results to one or more state elementsAha!

    A state element can be read and written in the same clock cycle!How long time to reach a stable state?

  • ReadAddressInstructionInstructionMemoryAddPC432 bit instructionFr att hmta en instruktion frn minnet lser vi helt enkelt p den plats som PC anger.Fr att hmta nsta instruktion flyttar vi fram PC 32 bitar, dvs fyra bytes.Vi brjar bygga en datapath

  • Good design demands good compromisesMake the common case fast.Simplicity favors regularitySmaller is fasterDesign principles...

  • I-typeR-type

  • DRAM SRAM Register?

  • ReadAddressInstructionInstructionMemoryAddPC4DataMemoryAddressWrite DataRead DataMemWriteMemReadALUSrcStter ihop fetch...Med R-type (add, sub, and, or, slt...)Write DataRead Addr 1Read Addr 2Write AddrRegister

    FileRead Data 1Read Data 2RegWriteSignExtend1632Och I-type lw/swA simple datapath

  • beq$t1, $t2 my_labelHur funkar branch-instruktioner...

  • rsoprtaddress/constant16 bitsI-typeR-typeopTarget address26 bits6 bits32 bitsJ-typeOP always hereOperand-1 register always here even for lw/sw (base register)Operand-2 register allways here for add, sub, and slt, etc but also for sw (value to store)Write result to this register for add, sub, and, slt, etc Write result to this registerfor lw

  • Single cycle design fetch, decode and execute each instructions in one clock cycle

    No datapath resource can be used more than once per instruction, so some must be duplicated (e.g., separate Instruction Memory and Data Memory, several adders)

    Multiplexors needed at the input of shared elements with control lines to do the selection

    Write signals to control writing to the Register File and Data Memory

    Cycle time is determined by length of the longest path

  • Single Cycle Datapath with Control UnitReadAddressInstr[31-0]InstructionMemoryAddPC4Write DataRead Addr 1Read Addr 2Write AddrRegister

    FileRead Data 1Read Data 2ALUovfzeroRegWriteDataMemoryAddressWrite DataRead DataMemWriteMemReadSignExtend1632MemtoRegALUSrcShiftleft 2AddPCSrcRegDstALUcontrol11100001ALUOpInstr[5-0]Instr[15-0]Instr[25-21]Instr[20-16]Instr[15 -11]ControlUnitInstr[31-26]BranchMultiplexor

  • ALUovfzeroALUcontrol4 bit32 bitThis is a sub set of all the possible operations

  • Sannings-tabellKarnough-diagramHrdvara

  • R-type Instruction Data/Control FlowReadAddressInstr[31-0]InstructionMemoryAddPC4Write DataRead Addr 1Read Addr 2Write AddrRegister

    FileRead Data 1Read Data 2ALUovfzeroRegWriteDataMemoryAddressWrite DataRead DataMemWriteMemReadSignExtend1632MemtoRegALUSrcShiftleft 2AddPCSrcRegDstALUcontrol11100001ALUOpInstr[5-0]Instr[15-0]Instr[25-21]Instr[20-16]Instr[15 -11]ControlUnitInstr[31-26]Branch

  • Load Word Instruction Data/Control FlowReadAddressInstr[31-0]InstructionMemoryAddPC4Write DataRead Addr 1Read Addr 2Write AddrRegister

    FileRead Data 1Read Data 2ALUovfzeroRegWriteDataMemoryAddressWrite DataRead DataMemWriteMemReadSignExtend1632MemtoRegALUSrcShiftleft 2AddPCSrcRegDstALUcontrol11100001ALUOpInstr[5-0]Instr[15-0]Instr[25-21]Instr[20-16]Instr[15 -11]ControlUnitInstr[31-26]Branch

  • NOTE: this is a single-cycle implementationClock Cycle time must be long enough for the longest possible pathA god candidate for the longest path?Load WordUses five functional units:Instruction memoryRegister fileALUData memoryRegister fileAnd a R-type instruction such as add only uses four functional unitsInstruction memoryRegister fileData memoryRegister file

  • Single Cycle Disadvantages & AdvantagesUses the clock cycle inefficiently the clock cycle must be timed to accommodate the slowest instructionespecially problematic for more complex instructions like floating point multiply

    May be waste of area since some functional units (e.g., adders) must be duplicated since they can not be shared during a clock cyclebutIs simple and easy to understand

  • Multicycle Datapath ApproachLet an instruction take more than 1 clock cycle to completeBreak up instructions into steps where each step takes a cycle while trying tobalance the amount of work to be done in each steprestrict each cycle to use only one major functional unitNot every instruction takes the same number of clock cycles

    In addition to faster clock rates, multicycle allows functional units that can be used more than once per instruction as long as they are used on different clock cycles, as a resultonly need one memory but only one memory access per cycleneed only one ALU/adder but only one ALU operation per cycle

  • Multicycle Datapath Approach, contAt the end of a cycleStore values needed in a later cycle by the current instruction in an internal register (not visible to the programmer). All (except IR) hold data only between a pair of adjacent clock cycles (no write control signal needed)

    IR Instruction RegisterMDR Memory Data RegisterA, B regfile read data registersALUout ALU output registerData used by subsequent instructions are stored in programmer visible registers (i.e., register file, PC, or memory)

  • The Multicycle Datapath with Control Signals

    AddressRead Data(Instr. or Data)MemoryPCWrite DataRead Addr 1Read Addr 2Write AddrRegister

    FileRead Data 1Read Data 2ALU

    Write DataIRABALUoutSignExtendShiftleft 2ALUcontrolShiftleft 2ALUOpControlIRWriteMemtoRegMemWriteMemReadIorDPCWritePCWriteCondRegDstRegWriteALUSrcAALUSrcBzeroPCSource1111110000002234Instr[5-0]Instr[25-0]PC[31-28]Instr[15-0]Instr[31-26]3228

  • The Five Steps of the Load InstructionIFetch: Instruction Fetch and Update PCDec: Instruction Decode, Register Read, Sign Extend OffsetExec: Execute R-type; Calculate Memory Address; Branch Comparison; Branch and Jump CompletionMem: Memory Read; Memory Write Completion; R-type Completion (RegFile write)WB: Memory Read Completion (RegFile write)Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5DeclwINSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

  • Hungrig!

    Note that instruction count is dynamic i.e., its not the number of lines in the code, but THE NUMBER OF INSTRUCTIONS EXECUTEDNote mux control inputs have been swapped (for three of the muxes) from the last picture to be consistent with the book.For lectureFor lectureIn the Single Cycle implementation, the cycle time is set to accommodate the longest instruction, the Load instruction.Since the cycle time has to be long enough for the load instruction, it is too long for the store instruction so the last part of the cycle here is wasted.As shown here, each of these five steps will take one clock cycle to complete.