f37 Book Intarch Pres Pt1

Embed Size (px)

Citation preview

  • 7/30/2019 f37 Book Intarch Pres Pt1

    1/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 1

    Part IBackground and Motivation

  • 7/30/2019 f37 Book Intarch Pres Pt1

    2/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 2

    About This Presentation

    This presentation is intended to support the use of the textbookComputerArchitecture:FromMicroprocessorstoSupercomputers,Oxford University Press, 2005, ISBN 0-19-515455-X. It is updatedregularly by the author as part of his teaching of the upper-divisioncourse ECE 154, Introduction to Computer Architecture, at the

    University of California, Santa Barbara. Instructors can use theseslides freely in classroom teaching and for other educationalpurposes. Any other use is strictly prohibited. BehroozParhami

    Edition Released Revised Revised Revised Revised

    First June 2003 July 2004 June 2005 Mar. 2006 Jan. 2007

    Jan. 2008 Jan. 2009 Jan. 2011

    Second

  • 7/30/2019 f37 Book Intarch Pres Pt1

    3/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 3

    I Background and Motivation

    Topics in This PartChapter 1 Combinational Digital CircuitsChapter 2 Digital Circuits with MemoryChapter 3 Computer System TechnologyChapter 4 Computer Performance

    Provide motivation, paint the big picture, introduce tools: Review components used in building digital circuits Present an overview of computer technology Understand the meaning of computer performance

    (or why a 2 GHz processor isnt 2 as fast as a 1 GHz model)

  • 7/30/2019 f37 Book Intarch Pres Pt1

    4/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 4

    1 Combinational Digital Circuits

    First of two chapters containing a review of digital design: Combinational, or memoryless, circuits in Chapter 1 Sequential circuits, with memory, in Chapter 2

    Topics in This Chapter1.1 Signals, Logic Operators, and Gates1.2 Boolean Functions and Expressions1.3 Designing Gate Networks1.4 Useful Combinational Parts1.5 Programmable Combinational Parts1.6 Timing and Circuit Considerations

  • 7/30/2019 f37 Book Intarch Pres Pt1

    5/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 5

    1.1 Signals, Logic Operators, and Gates

    Figure 1.1 Some basic elements of digital logic circuits, withoperator signs used in this book highlighted.

    x y

    ANDName XORORNOTGraphicalsymbol

    x y

    Operatorsign andalternate(s)

    x yx yxy

    x y

    x

    xorx_

    x

    y or xyArithmeticexpression

    x

    y

    2xyx

    y

    xy1x

    Outputis 1 iff:

    Input is 0Both inputs

    are 1sAt least one

    input is 1Inputs arenot equal

  • 7/30/2019 f37 Book Intarch Pres Pt1

    6/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 6

    The Arithmetic Substitution Method

    z

    = 1z NOT converted to arithmetic formxy AND same as multiplication(when doing the algebra, set zk= z)

    xy=x+ yxy OR converted to arithmetic formxy=x+ y 2xy XOR converted to arithmetic form

    Example: Prove the identity xyz x y z?1

    LHS = [xyz x] [y z]

    = [xyz+ 1x (1x)xyz] [1y+ 1z (1y)(1z)]

    = [xyz+ 1x] [1 yz]

    = (xyz+ 1x) + (1 yz) (xyz+ 1x)(1 yz)

    = 1 +xy2z2xyz

    = 1 = RHS This is addition,not logical OR

  • 7/30/2019 f37 Book Intarch Pres Pt1

    7/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 7

    Variations in Gate Symbols

    Figure 1.2 Gates with more than two inputs and/or with

    inverted signals at input or output.

    OR NORNANDAND XNOR

  • 7/30/2019 f37 Book Intarch Pres Pt1

    8/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 8

    Gates as Control Elements

    Figure 1.3 An AND gate and a tristate buffer act as controlled switchesor valves. An inverting buffer is logically the same as a NOT gate.

    Enable/Pass signale

    Data inx

    Data outxor 0

    Data inx

    Enable/Pass signale

    Data outxor high impedance

    (a) AND gate for controlled transfer (b) Tristate buffer

    (c) Model for AND switch.

    x

    e

    No dataorx

    0

    1x

    e

    ex

    0

    1

    0

    (d) Model for tristate buffer.

  • 7/30/2019 f37 Book Intarch Pres Pt1

    9/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 9

    Wired OR and Bus Connections

    Figure 1.4 Wired OR allows tying together of severalcontrolled signals.

    e

    e

    e Data out(x, y, z,

    or highimpedance)

    (b) Wired OR of tristate outputs

    e

    e

    e

    Data out

    (x, y, z, or 0)

    (a) Wired OR of product terms

    z

    x

    y

    z

    x

    y

    z

    x

    y

    z

    x

    y

  • 7/30/2019 f37 Book Intarch Pres Pt1

    10/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 10

    Control/Data Signals and Signal Bundles

    Figure 1.5 Arrays of logic gates represented by a single gate symbol.

    /8

    /

    8

    /

    8

    Compl

    /

    32

    /

    k

    /32

    Enable

    /

    k

    /k/k

    (b) 32 AND gates (c) k XOR gates(a) 8 NOR gates

  • 7/30/2019 f37 Book Intarch Pres Pt1

    11/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 11

    1.2 Boolean Functions and Expressions

    Ways of specifying a logic function

    Truth table: 2nrow, dont-care in input or output

    Logic expression: w(xyz), product-of-sums,sum-of-products, equivalent expressions

    Word statement: Alarm will sound if the dooris opened while the security system is engaged,

    or when the smoke detector is triggered

    Logic circuit diagram: Synthesis vs analysis

  • 7/30/2019 f37 Book Intarch Pres Pt1

    12/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 12

    Table 1.2 Laws (basic identities) of Boolean algebra.

    Name of law OR version AND version

    Identity x 0 =x x1 =x

    One/Zero x 1 = 1 x0 = 0

    Idempotent xx= x x x=x

    Inverse xx = 1 x x = 0

    Commutative xy= yx xy= yx

    Associative (xy) z=x (yz) (xy) z=x(yz)Distributive x (y z) = (xy) (xz) x(yz) = (xy) (xz)

    DeMorgans (xy) =xy (x y) =xy

    Manipulating Logic Expressions

  • 7/30/2019 f37 Book Intarch Pres Pt1

    13/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 13

    Proving the Equivalence of Logic Expressions

    Example 1.1 Truth-table method: Exhaustive verification

    Arithmetic substitution

    x

    y=x+ yxyxy=x+ y 2xy

    Case analysis: two cases,x= 0 orx= 1

    Logic expression manipulation

    Example: xy?xyxyx+ y2xy?(1x)y + x(1y)(1x)yx(1y)

  • 7/30/2019 f37 Book Intarch Pres Pt1

    14/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 14

    1.3 Designing Gate Networks

    AND-OR, NAND-NAND, OR-AND, NOR-NOR

    Logic optimization: cost, speed, power dissipation

    (a) AND-OR circuit

    z

    x

    y

    x

    y

    z

    (b) Intermediate circuit (c) NAND-NAND equi valent

    z

    x

    y

    x

    y

    z

    z

    x

    y

    x

    y

    z

    Figure 1.6 A two-level AND-OR circuit and two equivalent circuits.

    (abc) = abc

  • 7/30/2019 f37 Book Intarch Pres Pt1

    15/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 15

    Seven-Segment Display of Decimal Digits

    Figure 1.7 Seven-segment displayof decimal digits. The three opensegments may be optionally used.The digit 1 can be displayed in twoways, with the more common right-side version shown.

    Optional segment

  • 7/30/2019 f37 Book Intarch Pres Pt1

    16/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 16

    BCD-to-Seven-Segment Decoder

    Example 1.2

    Figure 1.8 The logic circuit that generates the enable signal for thelowermost segment (number 3) in a seven-segment display unit.

    x3

    x2

    x1

    x0

    Signals toenable orturn on thesegments

    4-bit input in [0, 9]e

    0

    e5

    e6

    e4

    e2

    e1

    e3

    1

    24

    5

    0

    3

    6

  • 7/30/2019 f37 Book Intarch Pres Pt1

    17/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 17

    1.4 Useful Combinational Parts

    High-level building blocks

    Much like prefab parts used in building a house

    Arithmetic components (adders, multipliers, ALUs)will be covered in Part III

    Here we cover three useful parts:multiplexers, decoders/demultiplexers, encoders

  • 7/30/2019 f37 Book Intarch Pres Pt1

    18/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 18

    Multiplexers

    Figure 1.9 Multiplexer (mux), or selector, allows one of several inputsto be selected and routed to output depending on the binary value of a

    set of selection or address signals provided to it.

    x

    x

    y

    z

    1

    0x

    x

    z

    y

    x

    x

    y

    z

    1

    0

    y

    /32

    /32

    /32 1

    01

    0

    3

    2

    z

    y1 0

    1

    0

    1

    0

    y1

    y0

    y0

    (a) 2-to-1 mux (b) Switch view (c) Mux symbol

    (d) Mux array (e) 4-to-1 mux with enable (e) 4-to-1 mux design

    0

    1

    y

    1 1 1

    0

    00

    x

    x

    x

    x

    10

    2

    3

    x

    x

    x

    x

    0

    1

    2

    3

    z

    e (Enable)

    1 0

    x2

  • 7/30/2019 f37 Book Intarch Pres Pt1

    19/91

    Computer Architecture, Background and Motivation Slide 19

    Decoders/Demultiplexers

    Figure 1.10 A decoder allows the selection of one of 2a options usingan a-bit address as input. A demultiplexer (demux) is a decoder that

    only selects an output if its enable signal is asserted.

    y1

    y0

    x0

    x3

    x2

    x1

    1

    0

    3

    2

    y1

    y0

    x0

    x3

    x2

    x1e1

    0

    3

    2

    y1

    y0

    x0

    x3

    x

    2

    x1

    (a) 2-to-4 decoder (b) Decoder symbol(c) Demultiplexer, ordecoder with enable

    (Enable)1

    1 0

    1

    1 0

    11

    1

    Jan. 2011

  • 7/30/2019 f37 Book Intarch Pres Pt1

    20/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 20

    Encoders

    Figure 1.11 A 2a-to-a encoder outputs an a-bit binary number

    equal to the index of the single 1 among its 2a inputs.

    (a) 4-to-2 encoder (b) Encoder symbol

    x0

    x3

    x2

    x1

    y1

    y0

    1

    0

    3

    2

    x0

    x3

    x2

    x1

    y1

    y0

    1 0

    1

    0

    0

    0

  • 7/30/2019 f37 Book Intarch Pres Pt1

    21/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 21

    1.5 Programmable Combinational Parts

    Programmable ROM (PROM)

    Programmable array logic (PAL)

    Programmable logic array (PLA)

    A programmable combinational part can do the job ofmany gates or gate networks

    Programmed by cutting existing connections (fuses)

    or establishing new connections (antifuses)

  • 7/30/2019 f37 Book Intarch Pres Pt1

    22/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 22

    PROMs

    Figure 1.12 Programmable connections and their use in a PROM.

    . . .

    .

    .

    .

    Inputs

    Outputs

    (a) ProgrammableOR gates

    w

    x

    y

    z

    (b) Logic equivalentof part a

    w

    x

    y

    z

    (c) Programmable read-onlymemory (PROM)

    Decoder

  • 7/30/2019 f37 Book Intarch Pres Pt1

    23/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 23

    PALs and PLAs

    Figure 1.13 Programmable combinational logic: general structure andtwo classes known as PAL and PLA devices. Not shown is PROM with

    fixed AND array (a decoder) and programmable OR array.

    ANDarray(ANDplane)

    ORarray(OR

    plane)

    . . .

    . . .

    .

    .

    .

    Inputs

    Outputs

    (a) General programmablecombinational logic

    (b) PAL: programmableAND array, fixed OR array

    8-inputANDs

    (c) PLA: programmableAND and OR arrays

    6-inputANDs

    4-inputORs

  • 7/30/2019 f37 Book Intarch Pres Pt1

    24/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 24

    1.6 Timing and Circuit Considerations

    Gate delay d: a fraction of, to a few, nanoseconds

    Wire delay, previously negligible, is now important(electronic signals travel about 15 cm per ns)

    Circuit simulation to verify function and timing

    Changes in gate/circuit output, triggered by changes in itsinputs, are not instantaneous

  • 7/30/2019 f37 Book Intarch Pres Pt1

    25/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 25

    Glitching

    Figure 1.14 Timing diagram for a circuit that exhibits glitching.

    x= 0

    y

    z

    a =xy

    f=

    az

    2d 2d

    Using the PAL in Fig. 1.13b to implement f=xyz

    AND-OR

    (PAL)

    AND-OR

    (PAL)

    x

    y

    z

    a

    f

  • 7/30/2019 f37 Book Intarch Pres Pt1

    26/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 26

    CMOS Transmission Gates

    Figure 1.15 A CMOS transmission gate and its use in building

    a 2-to-1 mux.

    z

    x

    x

    0

    1

    (a) CMOS transmission gate:

    circuit and symbol

    (b) Two-input mux built of two

    transmission gates

    TG

    TGTG

    yP

    N

  • 7/30/2019 f37 Book Intarch Pres Pt1

    27/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 27

    2 Digital Circuits with Memory

    Second of two chapters containing a review of digital design: Combinational (memoryless) circuits in Chapter 1 Sequential circuits (with memory) in Chapter 2

    Topics in This Chapter

    2.1 Latches, Flip-Flops, and Registers2.2 Finite-State Machines2.3 Designing Sequential Circuits2.4 Useful Sequential Parts2.5 Programmable Sequential Parts2.6 Clocks and Timing of Events

  • 7/30/2019 f37 Book Intarch Pres Pt1

    28/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 28

    2.1 Latches, Flip-Flops, and Registers

    Figure 2.1 Latches, flip-flops, and registers.

    RQ

    QS

    DQ

    Q

    C

    Q

    Q

    D

    C

    (a) SR latch (b) D latch

    Q

    C Q

    DQ

    C Q

    D

    (e) k-bit register(d) D flip-flop symbol(c) Master-slave D flip-flop

    Q

    C Q

    D

    FF

    / /k k

    Q

    C Q

    D

    FF

    R

    S

  • 7/30/2019 f37 Book Intarch Pres Pt1

    29/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 29

    Latches vs Flip-Flops

    Figure 2.2 Operations of D latch andnegative-edge-triggered D flip-flop.

    D

    C

    D latch: Q

    D FF: Q

    Setuptime Setuptime HoldtimeHoldtimeD

    C

    Q

    Q

    D

    C

    Q

    Q

    FF

  • 7/30/2019 f37 Book Intarch Pres Pt1

    30/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 30

    Reading and Modifying FFs in the Same Cycle

    Figure 2.3 Register-to-register operation with edge-triggeredflip-flops.

    / /k k

    Q

    C Q

    D

    FF

    / /k k

    Q

    C Q

    D

    FF

    Computation module(combinational logic)

    ClockPropagation delayCombinational delay

  • 7/30/2019 f37 Book Intarch Pres Pt1

    31/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 31

    2.2 Finite-State Machines

    Example 2.1

    Figure 2.4 State table and state diagram for a vendingmachine coin reception unit.

    DimeDimeQuarter

    Dime

    Quarter

    DimeQuarter

    DimeQuarter

    ResetReset

    Reset

    Reset

    Reset

    StartQuarter

    S00

    S10

    S20S

    25

    S30

    S35

    S10

    S25

    S00

    S00

    S00S

    00

    S00

    S00

    S20

    S35

    S35

    S35

    S35

    S35

    S35S30

    S35

    S35

    ------- Input -------

    Dime

    Quarter

    Reset

    Currentstate

    S00S35

    is the initial state

    is the final state

    Next state

    DimeQuarter

    S00

    S10

    S20

    S25

    S30

    S35

  • 7/30/2019 f37 Book Intarch Pres Pt1

    32/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 32

    Sequential Machine Implementation

    Figure 2.5 Hardware realization of Moore and Mealysequential machines.

    Next-state

    logic

    State register/n

    /m

    /l

    Inputs Outputs

    Next-stateexcitationsignals

    Presentstate

    Outputlogic

    Only for Mealy machine

  • 7/30/2019 f37 Book Intarch Pres Pt1

    33/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 33

    2.3 Designing Sequential Circuits

    Example 2.3

    Figure 2.7 Hardware realization of a coin reception unit (Example 2.3).

    Output

    Q

    C Q

    De

    Inputs

    Q

    C Q

    D

    Q

    C Q

    D

    FF2

    FF1

    FF0

    q

    d

    Quarter in

    Dime in

    Finalstateis 1xx

  • 7/30/2019 f37 Book Intarch Pres Pt1

    34/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 34

    2.4 Useful Sequential Parts

    High-level building blocks

    Much like prefab closets used in building a house

    Other memory components will be covered inChapter 17 (SRAM details, DRAM, Flash)

    Here we cover three useful parts:shift register, register file (SRAM basics), counter

  • 7/30/2019 f37 Book Intarch Pres Pt1

    35/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 35

    Shift Register

    Figure 2.8 Register with single-bit left shift and parallel loadcapabilities. For logical left shift, serial data in line is connected to 0.

    Parallel data in

    /k

    /k

    /k

    Shift

    Q

    C Q

    D

    FF1

    0

    Serial data in

    /k1 LSBs

    Load

    Parallel data out

    Serial data outMSB

    01001110

  • 7/30/2019 f37 Book Intarch Pres Pt1

    36/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 36

    Register File and FIFO

    Figure 2.9 Register file with random access and FIFO.

    De

    coder

    /k

    /k

    /h

    Writeenable

    Read address 0

    Read address 1

    Readdata 0

    Writedata

    Readenable

    2 k-bit registersh

    /k

    /k

    /k

    /k

    /k

    /k

    /k

    /h

    Write

    address

    Muxes

    Readdata 1

    /k

    /h

    /h

    /h

    /k

    /h

    Write enable

    Readaddr 0

    /k

    /kRead

    addr 1

    Write

    dataWriteaddr

    Readdata 0

    Read enable

    Readdata 1

    (a) Register file with random access

    (b) Graphic symbolfor register file

    Q

    C Q

    D

    FF

    /k

    Q

    C Q

    D

    FF

    Q

    C Q

    D

    FF

    Q

    C Q

    D

    FF

    /k

    Push

    /k

    Input Output

    Pop

    Full

    Empty

    (c) FIFO symbol

  • 7/30/2019 f37 Book Intarch Pres Pt1

    37/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 37

    SRAM

    Figure 2.10 SRAM memory is simply a large, single-port register file.

    Column mux

    Row

    decoder

    /h

    Address

    Square oralmost squarememory matrix

    Row buffer

    Row

    Column

    gbits data out

    /g

    /h

    Write enable

    /g

    Data in

    Address

    Data out

    Outputenable

    Chipselect

    .

    .

    .

    . . .

    . . .

    (a) SRAM block diagram (b) SRAM read mechanism

  • 7/30/2019 f37 Book Intarch Pres Pt1

    38/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 38

    Binary Counter

    Figure 2.11 Synchronous binary counter with initialization capability.

    Count register

    Mux

    Incrementer

    0

    Input

    Load

    IncrInit

    x+ 1

    x

    0 1

    1c

    inc

    out

  • 7/30/2019 f37 Book Intarch Pres Pt1

    39/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 39

    2.5 Programmable Sequential Parts

    Programmable array logic (PAL)

    Field-programmable gate array (FPGA)

    Both types contain macrocells and interconnects

    A programmable sequential part contain gates andmemory elements

    Programmed by cutting existing connections (fuses)

    or establishing new connections (antifuses)

  • 7/30/2019 f37 Book Intarch Pres Pt1

    40/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 40

    PAL and FPGA

    Figure 2.12 Examples of programmable sequential logic.

    (a) Portion of PAL with storable output (b) Generic structure of an FPGA

    8-inputANDs

    DC

    QQ

    FF

    Mux

    Mux

    0 1

    0 1

    I/O blocks

    Configurablelogic block

    Programmableconnections

    CLB

    CLB

    CLB

    CLB

  • 7/30/2019 f37 Book Intarch Pres Pt1

    41/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 41

    2.6 Clocks and Timing of Events

    Clock is a periodic signal: clock rate = clock frequencyThe inverse of clock rate is the clock period: 1 GHz 1 nsConstraint: Clock period tprop + tcomb + tsetup + tskew

    Figure 2.13 Determining the required length of the clock period.

    Other inputs

    Combinational

    logic

    Clock period

    FF1 beginsto change

    FF1 changeobserved

    Must be wide enoughto accommodate

    worst-case delays

    Clock1 Clock2

    Q

    C Q

    DFF2

    Q

    C Q

    DFF1

  • 7/30/2019 f37 Book Intarch Pres Pt1

    42/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 42

    Synchronization

    Figure 2.14 Synchronizers are used to prevent timing problems

    arising from untimely changes in asynchronous signals.

    Asynch

    input

    Asynch

    input

    Synch

    version

    Synch

    version

    Asynchinput

    Synchversion

    Clock

    (a) Simple synchronizer (b) Two-FF synchronizer

    (c) Input and output waveforms

    Q

    C Q

    DFF

    Q

    C Q

    DFF2

    Q

    C Q

    DFF1

  • 7/30/2019 f37 Book Intarch Pres Pt1

    43/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 43

    Level-Sensitive Operation

    Figure 2.15 Two-phase clocking with nonoverlapping clock signals.

    Combi-national

    logic1

    1

    Clock period

    Q

    C Q

    DLatch

    1

    Q

    C Q

    DLatch

    Other inputs

    Combi-national

    logic2

    2

    Clocks withnonoverlappinghighs

    Other inputs

    Q

    C Q

    LatchD

  • 7/30/2019 f37 Book Intarch Pres Pt1

    44/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 44

    3 Computer System Technology

    Interplay between architecture, hardware, and software Architectural innovations influence technology Technological advances drive changes in architecture

    Topics in This Chapter3.1 From Components to Applications3.2 Computer Systems and Their Parts3.3 Generations of Progress3.4 Processor and Memory Technologies

    3.5 Peripherals, I/O, and Communications3.6 Software Systems and Applications

  • 7/30/2019 f37 Book Intarch Pres Pt1

    45/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 45

    3.1 From Components to Applications

    Figure 3.1 Subfields or views in computer system engineering.

    High-

    levelview

    Computerdesigner

    Circuitdesigner

    Applicationdesigner

    System

    designer

    Logicd

    esigner

    Software Hardware

    Computer organization

    Low-

    levelview

    Applicatio

    ndomains

    Electroniccomponents

    Computer architecture

  • 7/30/2019 f37 Book Intarch Pres Pt1

    46/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 46

    What Is (Computer) Architecture?

    Figure 3.2 Like a building architect, whose place at theengineering/arts and goals/means interfaces is seen in this diagram, a

    computer architect reconciles many conflicting or competing demands.

    ArchitectInterface

    Interface

    Goals

    Means

    ArtsEngineering

    Clients taste:

    mood, style, . . .

    Clients requirements:

    function, cost, . . .

    The world of arts:aesthetics, trends, . . .

    Construction technology:material, codes, . . .

  • 7/30/2019 f37 Book Intarch Pres Pt1

    47/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 47

    3.2 Computer Systems and Their Parts

    Figure 3.3 The space of computer systems, with what we normallymean by the word computer highlighted.

    Computer

    Analog

    Fixed-function Stored-program

    Electronic Nonelectronic

    General-purpose Special-purpose

    Number cruncher Data manipulator

    Digital

  • 7/30/2019 f37 Book Intarch Pres Pt1

    48/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 48

    Price/Performance Pyramid

    Figure 3.4 Classifying computers by computational

    power and price range.

    Embedded

    PersonalWorkstation

    Server

    Mainframe

    Super $Millions

    $100s Ks$10s Ks

    $1000s

    $100s

    $10s

    Differences in scale,

    not in substance

  • 7/30/2019 f37 Book Intarch Pres Pt1

    49/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 49

    Automotive Embedded Computers

    Figure 3.5 Embedded computers are ubiquitous, yet invisible. They

    are found in our automobiles, appliances, and many other places.

    Engine

    Impact sensors

    Navigation &entertainment

    Centralcontroller

    BrakesAirbags

  • 7/30/2019 f37 Book Intarch Pres Pt1

    50/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 50

    Personal Computers and Workstations

    Figure 3.6 Notebooks, a common class of portable computers,are much smaller than desktops but offer substantially the samecapabilities. What are the main reasons for the size difference?

    Di it l C t S b t

  • 7/30/2019 f37 Book Intarch Pres Pt1

    51/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 51

    Digital Computer Subsystems

    Figure 3.7 The (three, four, five, or) six main units of a digitalcomputer. Usually, the link unit (a simple bus or a more elaboratenetwork) is not explicitly included in such diagrams.

    Memory

    Link Input/Output

    To/from network

    Processor

    Control

    Datapath

    Input

    Output

    CPU I/O

  • 7/30/2019 f37 Book Intarch Pres Pt1

    52/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 52

    3.3 Generations of ProgressTable 3.2 The 5 generations of digital computers, and their ancestors.

    Generation(begun)

    Processortechnology

    Memoryinnovations

    I/O devicesintroduced

    Dominantlook & fell

    0 (1600s) (Electro-)mechanical

    Wheel, card Lever, dial,punched card

    Factoryequipment

    1 (1950s) Vacuum tube Magnetic

    drum

    Paper tape,

    magnetic tape

    Hall-size

    cabinet2 (1960s) Transistor Magnetic

    coreDrum, printer,text terminal

    Room-sizemainframe

    3 (1970s) SSI/MSI RAM/ROMchip

    Disk, keyboard,video monitor

    Desk-sizemini

    4 (1980s) LSI/VLSI SRAM/DRAM Network, CD,mouse,sound

    Desktop/laptop micro

    5 (1990s) ULSI/GSI/WSI, SOC

    SDRAM,flash

    Sensor/actuator,point/click

    Invisible,embedded

  • 7/30/2019 f37 Book Intarch Pres Pt1

    53/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 53

    Figure 3.8 The manufacturing process for an IC part.

    IC Production and Yield

    15-30cm

    30-60 cm

    Siliconcrystalingot

    SlicerProcessing:20-30 steps

    Blank waferwith defects

    x xx x xx x

    x xx x

    0.2 cm

    Patterned wafer

    (100s of simple or scoresof complex processors)

    DicerDie

    ~1 cm

    Gooddie

    ~1 cm

    Dietester

    Microchipor other part

    MountingPart

    testerUsable

    partto ship

  • 7/30/2019 f37 Book Intarch Pres Pt1

    54/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 54

    Figure 3.9 Visualizing the dramatic decrease in yieldwith larger dies.

    Effect of Die Size on Yield

    120 dies, 109 ood 26 dies, 15 ood

    Die yield =def(number of good dies) / (total number of dies)

    Die yield = Wafer yield [1 + (Defect density Die area) / a]a

    Die cost = (cost of wafer) / (total number of dies die yield)= (cost of wafer) (die area / wafer area) / (die yield)

  • 7/30/2019 f37 Book Intarch Pres Pt1

    55/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 55

    3.4 Processor and Memory Technologies

    Figure 3.11 Packaging of processor, memory, and other components.

    PC board

    Backplane

    Memory

    CPU

    Bus

    Connector

    (b) 3D packaging of the future(a) 2D or 2.5D packaging now common

    Stacked layersglued together

    Interlayer connections

    deposited on theoutside of the stackDie

    TIPS Tb

  • 7/30/2019 f37 Book Intarch Pres Pt1

    56/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 56

    Figure 3.10 Trends in processor performance and DRAMmemory chip capacity (Moores law).

    MooresLaw

    1Mb

    19901980 2000 2010kIPS

    MIPS

    GIPS

    Proc

    essorperformanc

    e

    Calendar year

    8028668000

    80386

    80486

    68040

    Pentium

    Pentium II

    R10000

    1.6 / yr

    10 / 5 yrs2 / 18 mos

    64Mb

    4Mb

    64kb

    256kb

    256Mb

    1Gb

    16Mb

    4 / 3 yrs

    Processor

    Memory

    kb

    Mb

    Gb

    Mem

    orychipcapacity

  • 7/30/2019 f37 Book Intarch Pres Pt1

    57/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 57

    Pitfalls of Computer Technology Forecasting

    DOS addresses only 1 MB of RAM because we cannot

    imagine any applications needing more. Microsoft, 1980

    640K ought to be enough for anybody. Bill Gates, 1981

    Computers in the future may weigh no more than 1.5

    tons. Popular MechanicsI think there is a world market for maybe fivecomputers. Thomas Watson, IBM Chairman, 1943

    There is no reason anyone would want a computer in

    their home. Ken Olsen, DEC founder, 1977

    The 32-bit machine would be an overkill for a personalcomputer. Sol Libes, ByteLines

  • 7/30/2019 f37 Book Intarch Pres Pt1

    58/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 58

    3.5 Input/Output and Communications

    Figure 3.12 Magnetic and optical disk memory units.

    (a) Cutaway view of a hard disk drive (b) Some removable storage media

    Typically2-9 cm

    Floppydisk

    CD-ROM

    Magnetictape

    cartridge

    . .

    ......

    1210

  • 7/30/2019 f37 Book Intarch Pres Pt1

    59/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 59

    Figure 3.13 Latency and bandwidth characteristics of differentclasses of communication links.

    CommunicationTechnologies

    3

    6

    9

    12

    9 6 3 3

    Bandwidth(b/s)

    Latency (s)

    10

    10

    10

    10

    10 10 10 1 10

    Processorbus

    I/Onetwork

    System-areanetwork(SAN) Local-area

    network(LAN)

    Metro-areanetwork(MAN)

    Wide-areanetwork(WAN)

    Geographically distributed

    Same geographic location

    (ns) (s) (ms) (min) (h)

  • 7/30/2019 f37 Book Intarch Pres Pt1

    60/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 60

    3.6 Software Systems and Applications

    Figure 3.15 Categorization of software, with examples in each class.

    Software

    Application:word processor,

    spreadsheet,circuit simulator,

    . . .

    Operating system Translator:MIPS assembler,

    C compiler,. . .

    System

    Manager:virtual memory,

    security,file system,

    . . .

    Coordinator:scheduling,

    load balancing,diagnostics,

    . . .

    Enabler:disk driver,

    display driver,printing,

    . . .

  • 7/30/2019 f37 Book Intarch Pres Pt1

    61/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 61

    Figure 3.14 Models and abstractions in programming.

    High- vs Low-Level Programming

    Compiler

    Assembler

    Interpreter

    temp=v[i]v[i]=v[i+1]v[i+1]=temp

    Swap v[i]and v[i+1]

    add $2,$5,$5add $2,$2,$2add $2,$4,$2lw $15,0($2)lw $16,4($2)sw $16,0($2)sw $15,4($2)jr $31

    00a5102000421020008210208c6200008cf20004acf20000ac62000403e00008

    Veryhigh-levellanguageobjectivesor tasks

    High-levellanguagestatements

    Assemblylanguageinstructions,mnemonic

    Machinelanguageinstructions,binary (hex)

    One task =many statements

    One statement =several instructions

    Mostly one-to-one

    More abstract, machine-independent;easier to write, read, debug, or maintain

    More concrete, machine-specific, error-prone;harder to write, read, debug, or maintain

  • 7/30/2019 f37 Book Intarch Pres Pt1

    62/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 62

    4 Computer Performance

    Performance is key in design decisions; also cost and power It has been a driving force for innovation Isnt quite the same as speed (higher clock rate)

    Topics in This Chapter4.1 Cost, Performance, and Cost/Performance4.2 Defining Computer Performance4.3 Performance Enhancement andAmdahls Law4.4 Performance Measurement vs Modeling4.5 Reporting Computer Performance4.6 The Quest for Higher Performance

  • 7/30/2019 f37 Book Intarch Pres Pt1

    63/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 63

    4.1 Cost, Performance, and Cost/Performance

    19801960 2000 2020$1

    Comput

    ercost

    Calendar year

    $1 K

    $1 M

    $1 G

    C t/P f

  • 7/30/2019 f37 Book Intarch Pres Pt1

    64/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 64

    Figure 4.1 Performance improvement as a function of cost.

    Cost/Performance

    Performance

    Cost

    Superlinear:economy ofscale

    Sublinear:diminishingreturns

    Linear(ideal?)

  • 7/30/2019 f37 Book Intarch Pres Pt1

    65/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 65

    4.2 Defining Computer Performance

    Figure 4.2 Pipeline analogy shows that imbalance between processingpower and I/O capabilities leads to a performance bottleneck.

    ProcessingInput Output

    CPU-bound task

    I/O-bound task

    Six Passenger Aircraft to Be Compared

  • 7/30/2019 f37 Book Intarch Pres Pt1

    66/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 66

    Six Passenger Aircraft to Be Compared

    B 747

    DC-8-50

    Performance of Aircraft: An Analogy

  • 7/30/2019 f37 Book Intarch Pres Pt1

    67/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 67

    Performance of Aircraft: An Analogy

    Table 4.1 Key characteristics of six passenger aircraft: all figuresare approximate; some relate to a specific model/configuration of

    the aircraft or are averages of cited range of values.

    Aircraft PassengersRange(km)

    Speed(km/h)

    Price($M)

    Airbus A310 250 8 300 895 120

    Boeing 747 470 6 700 980 200

    Boeing 767 250 12 300 885 120

    Boeing 777 375 7 450 980 180

    Concorde 130 6 400 2 200 350

    DC-8-50 145 14 000 875 80

    Speed of sound 1220 km / h

  • 7/30/2019 f37 Book Intarch Pres Pt1

    68/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 68

    Different Views of Performance

    Performance from the viewpoint of a passenger: Speed

    Note, however, that flight time is but one part of total travel time.Also, if the travel distance exceeds the range of a faster plane,a slower plane may be better due to not needing a refueling stop

    Performance from the viewpoint of an airline: Throughput

    Measured in passenger-km per hour (relevant if ticket price wereproportional to distance traveled, which in reality it is not)

    Airbus A310 250 895 = 0.224 M passenger-km/hrBoeing 747 470 980 = 0.461 M passenger-km/hrBoeing 767 250 885 = 0.221 M passenger-km/hrBoeing 777 375 980 = 0.368 M passenger-km/hrConcorde 130 2200 = 0.286 M passenger-km/hrDC-8-50 145 875 = 0.127 M passenger-km/hr

    Performance from the viewpoint of FAA: Safety

    C t Eff ti C t/P f

  • 7/30/2019 f37 Book Intarch Pres Pt1

    69/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 69

    Cost Effectiveness: Cost/Performance

    Table 4.1 Key characteristics of six passengeraircraft: all figures are approximate; some relate to

    a specific model/configuration of the aircraft or areaverages of cited range of values.

    Aircraft Passen-gers

    Range(km)

    Speed(km/h)

    Price($M)

    A310 250 8 300 895 120

    B 747 470 6 700 980 200

    B 767 250 12 300 885 120

    B 777 375 7 450 980 180

    Concorde 130 6 400 2 200 350

    DC-8-50 145 14 000 875 80

    Cost /Performance

    536

    434

    543

    489

    1224

    630

    Smaller

    valuesbetter

    Throughput

    (M P km/hr)

    0.224

    0.461

    0.221

    0.368

    0.286

    0.127

    Larger

    valuesbetter

    C t f P f d S d

  • 7/30/2019 f37 Book Intarch Pres Pt1

    70/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 70

    Concepts of Performance and Speedup

    Performance = 1 / Execution time is simplified to

    Performance = 1 / CPU execution time

    (Performance of M1) / (Performance of M2) = Speedup of M1 over M2

    = (Execution time of M2) / (Execution time M1)

    Terminology: M1 isxtimes as fast as M2 (e.g., 1.5 times as fast)M1 is 100(x 1)% faster than M2 (e.g., 50% faster)

    CPU time = Instructions (Cycles per instruction) (Secs per cycle)

    = Instructions CPI / (Clock rate)

    Instruction count, CPI, and clock rate are not completely independent,so improving one by a given factor may not lead to overall executiontime improvement by the same factor.

    Elaboration on the CPU Time Form la

  • 7/30/2019 f37 Book Intarch Pres Pt1

    71/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 71

    Elaboration on the CPU Time Formula

    CPU time = Instructions (Cycles per instruction) (Secs per cycle)

    = Instructions Average CPI / (Clock rate)

    Clock period

    Clock rate: 1 GHz = 109 cycles / s (cycle time 109 s = 1 ns)200 MHz = 200 106 cycles / s (cycle time = 5 ns)

    Average CPI: Is calculated based on the dynamic instruction mixand knowledge of how many clock cycles are neededto execute various instructions (or instruction classes)

    Instructions: Number of instructions executed, not number ofinstructions in our program (dynamic count)

    Dynamic Instruction Count

  • 7/30/2019 f37 Book Intarch Pres Pt1

    72/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 72

    Dynamic Instruction Count

    250 instructions

    fori = 1, 100 do

    20 instructions

    forj = 1, 100 do

    40 instructions

    fork = 1, 100 do

    10 instructions

    endforendfor

    endfor

    How many instructionsare executed in this

    program fragment?

    Each for consists of two instructions:increment index, check exit condition

    2 + 40 + 1200 instructions100 iterations124,200 instructions in all

    2 + 10 instructions100 iterations1200 instructions in all

    2 + 20 + 124,200 instructions100 iterations

    12,422,200 instructions in all

    12,422,450 Instructions

    fori = 1, nwhile x > 0

    Static count = 326

    Faster Clock Shorter Running Time

  • 7/30/2019 f37 Book Intarch Pres Pt1

    73/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 73

    Figure 4.3 Faster steps do not necessarilymean shorter travel time.

    Faster Clock Shorter Running Time

    1 GHz

    2 GHz

    4 steps

    Solution

    20 steps

    Suppose addition takes 1 ns

    Clock period = 1 ns; 1 cycleClock period = ns; 2 cycles

    In this example, addition timedoes not improve in going from1 GHz to 2 GHz clock

  • 7/30/2019 f37 Book Intarch Pres Pt1

    74/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 74

    0

    10

    20

    30

    40

    50

    0 10 20 30 40 50Enhancement factor (p )

    Speedup(s)

    f = 0

    f = 0.1

    f = 0.05

    f = 0.02

    f = 0.01

    4.3 Performance Enhancement: Amdahls Law

    Figure 4.4 Amdahls law: speedup achieved if a fraction fof atask is unaffected and the remaining 1fpart runsp times as fast.

    s =

    min(p, 1/f)1

    f+(1f)/p

    f = fractionunaffectedp = speedup

    of the rest

    Amdahls Law Used in Design

  • 7/30/2019 f37 Book Intarch Pres Pt1

    75/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 75

    Example 4.1

    Amdahl s Law Used in Design

    A processor spends 30% of its time on flp addition, 25% on flp mult,and 10% on flp division. Evaluate the following enhancements, each

    costing the same to implement:

    a. Redesign of the flp adder to make it twice as fast.

    b. Redesign of the flp multiplier to make it three times as fast.

    c. Redesign the flp divider to make it 10 times as fast.

    Solution

    a. Adder redesign speedup = 1 / [0.7 + 0.3 / 2] = 1.18

    b. Multiplier redesign speedup = 1 / [0.75 + 0.25 / 3] = 1.20c. Divider redesign speedup = 1 / [0.9 + 0.1 / 10] = 1.10

    What if both the adder and the multiplier are redesigned?

    Amdahls Law Used in Management

  • 7/30/2019 f37 Book Intarch Pres Pt1

    76/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 76

    Example 4.2

    Amdahl s Law Used in Management

    Members of a university research group frequently visit the library.Each library trip takes 20 minutes. The group decides to subscribe

    to a handful of publications that account for 90% of the library trips;

    access time to these publications is reduced to 2 minutes.

    a. What is the average speedup in access to publications?

    b. If the group has 20 members, each making two weekly trips tothe library, what is the justifiable expense for the subscriptions?

    Assume 50 working weeks/yr and $25/h for a researchers time.

    Solution

    a. Speedup in publication access time = 1 / [0.1 + 0.9 / 10] = 5.26

    b. Time saved = 20 2 50 0.9 (20 2) = 32,400 min = 540 h

    Cost recovery = 540 $25 = $13,500 = Max justifiable expense

  • 7/30/2019 f37 Book Intarch Pres Pt1

    77/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 77

    4.4 Performance Measurement vs Modeling

    Figure 4.5 Running times of six programs on three machines.

    Execution time

    ProgramA E FB C D

    Machine 1

    Machine 2

    Machine 3

    Generalized Amdahls Law

  • 7/30/2019 f37 Book Intarch Pres Pt1

    78/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 78

    Generalized Amdahl s Law

    Original running time of a program = 1 = f1 + f2 + . . . + fk

    New running time after the fraction fi is speeded up by a factorpi

    f1 f2 fk+ + . . . +

    p1 p2 pk

    Speedup formula

    1

    S =f1

    f2

    fk

    + + . . . +

    p1 p2 pk

    If a particular fractionis slowed down ratherthan speeded up,use sjfj instead offj/pj,where sj> 1 is theslowdown factor

    Performance Benchmarks

  • 7/30/2019 f37 Book Intarch Pres Pt1

    79/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 79

    Performance Benchmarks

    Example 4.3

    You are an engineer at Outtel, a start-up aspiring to compete with Intelvia its new processor design that outperforms the latest Intel processor

    by a factor of 2.5 on floating-point instructions. This level of performance

    was achieved by design compromises that led to a 20% increase in the

    execution time of all other instructions. You are in charge of choosing

    benchmarks that would showcase Outtels performance edge.a. What is the minimum required fraction fof time spent on floating-point

    instructions in a program on the Intel processor to show a speedup of2 or better for Outtel?

    Solutiona. We use a generalized form of Amdahls formula in which a fraction f

    is speeded up by a given factor (2.5) and the rest is slowed down byanother factor (1.2): 1/ [1.2(1f) + f/2.5] 2 f 0.875

    Performance Estimation

  • 7/30/2019 f37 Book Intarch Pres Pt1

    80/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 80

    Performance Estimation

    Average CPI = All instruction classes (Class-ifraction) (Class-iCPI)

    Machine cycle time = 1 / Clock rate

    CPU execution time = Instructions (Average CPI) / (Clock rate)

    Table 4.3 Usage frequency, in percentage, for variousinstruction classes in four representative applications.

    Application Instrn class

    Datacompression

    C languagecompiler

    Reactorsimulation

    Atomic motionmodeling

    A: Load/Store 25 37 32 37

    B: Integer 32 28 17 5

    C: Shift/Logic 16 13 2 1

    D: Float 0 0 34 42

    E: Branch 19 13 9 10

    F: All others 8 9 6 4

    CPI and IPS Calculations

  • 7/30/2019 f37 Book Intarch Pres Pt1

    81/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 81

    CPI and IPS Calculations

    Example 4.4 (2 of 5 parts)

    Consider two implementations M1

    (600 MHz) and M2

    (500 MHz) ofan instruction set containing three classes of instructions:

    Class CPI for M1 CPI for M2 CommentsF 5.0 4.0 Floating-pointI 2.0 3.8 Integer arithmeticN 2.4 2.0 Nonarithmetic

    a. What are the peak performances of M1 and M2 in MIPS?b. If 50% of instructions executed are class-N, with the rest divided

    equally among F and I, which machine is faster? By what factor?

    Solution

    a. Peak MIPS for M1 = 600 / 2.0 = 300; for M2 = 500 / 2.0 = 250

    b. Average CPI for M1 = 5.0 / 4 + 2.0 / 4 + 2.4 / 2 = 2.95;

    for M2 = 4.0/4 + 3.8/4 + 2.0/2 = 2.95 M1 is faster; factor 1.2

    MIPS Rating Can Be Misleading

  • 7/30/2019 f37 Book Intarch Pres Pt1

    82/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 82

    MIPS Rating Can Be Misleading

    Example 4.5

    Two compilers produce machine code for a program on a machinewith two classes of instructions. Here are the number of instructions:

    Class CPI Compiler 1 Compiler 2

    A 1 600M 400M

    B 2 400M 400M

    a. What are run times of the two programs with a 1 GHz clock?

    b. Which compiler produces faster code and by what factor?

    c. Which compilers output runs at a higher MIPS rate?

    Solution

    a. Running time 1 (2) = (600M 1 + 400M 2) / 109 = 1.4 s (1.2 s)

    b. Compiler 2s output runs 1.4 / 1.2 = 1.17 times as fast

    c. MIPS rating 1, CPI = 1.4 (2, CPI = 1.5) = 1000 / 1.4 = 714 (667)

  • 7/30/2019 f37 Book Intarch Pres Pt1

    83/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 83

    4.5 Reporting Computer Performance

    Table 4.4 Measured or estimated execution times for three programs.

    Time onmachine X

    Time onmachine Y

    Speedup ofY over X

    Program A 20 200 0.1

    Program B 1000 100 10.0

    Program C 1500 150 10.0

    All 3 progs 2520 450 5.6

    Analogy: If a car is driven to a city 100 km away at 100 km/hrand returns at 50 km/hr, the average speed is not (100 + 50) / 2but is obtained from the fact that it travels 200 km in 3 hours.

    C i th O ll P f

  • 7/30/2019 f37 Book Intarch Pres Pt1

    84/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 84

    Table 4.4 Measured or estimated execution times for three programs.

    Time onmachine X

    Time onmachine Y

    Speedup ofY over X

    Program A 20 200 0.1

    Program B 1000 100 10.0

    Program C 1500 150 10.0

    Geometric mean does not yield a measure of overall speedup,but provides an indicator that at least moves in the right direction

    Comparing the Overall Performance

    Speedup ofX over Y

    10

    0.1

    0.1

    Arithmetic mean

    Geometric mean6.7

    2.15

    3.4

    0.46

    Effect of Instruction Mix on Performance

  • 7/30/2019 f37 Book Intarch Pres Pt1

    85/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 85

    Example 4.6 (1 of 3 parts)

    Consider two applications DC and RS and two machines M1 and M2:

    Class Data Comp. Reactor Sim. M1s CPI M2s CPIA: Ld/Str 25% 32% 4.0 3.8B: Integer 32% 17% 1.5 2.5C: Sh/Logic 16% 2% 1.2 1.2D: Float 0% 34% 6.0 2.6

    E: Branch 19% 9% 2.5 2.2F: Other 8% 6% 2.0 2.3

    a. Find the effective CPI for the two applications on both machines.

    Solution

    a. CPI of DC on M1: 0.25 4.0 + 0.32 1.5 + 0.16 1.2 + 0 6.0 +

    0.19 2.5 + 0.08 2.0 = 2.31

    DC on M2: 2.54 RS on M1: 3.94 RS on M2: 2.89

  • 7/30/2019 f37 Book Intarch Pres Pt1

    86/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 86

    4.6 The Quest for Higher Performance

    State of available computing power ca. the early 2000s:

    Gigaflops on the desktop

    Teraflops in the supercomputer center

    Petaflops on the drawing board

    Note on terminology (see Table 3.1)Prefixes for large units:

    Kilo = 103, Mega = 106, Giga = 109, Tera = 1012, Peta = 1015

    For memory:

    K = 210

    = 1024, M = 220

    , G = 230

    , T = 240

    , P = 250

    Prefixes for small units:micro = 106, nano = 109, pico = 1012, femto = 1015

    Performance Trends and Obsolescence

  • 7/30/2019 f37 Book Intarch Pres Pt1

    87/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 87

    Figure 3.10 Trends in processorperformance and DRAM memorychip capacity (Moores law).

    Performance Trends and Obsolescence

    1Mb

    19901980 2000 2010kIPS

    MIPS

    GIPS

    TIPS

    Processorperformance

    Calendar year

    8028668000

    80386

    80486

    68040

    Pentium

    Pentium II

    R10000

    1.6 / yr

    10 / 5 yrs2 / 18 mos

    64Mb

    4Mb

    64kb

    256kb

    256Mb

    1Gb

    16Mb

    4 / 3 yrs

    Processor

    Memory

    kb

    Mb

    Gb

    Tb

    Mem

    orychipcapacity

    Can I call you back? Wejust bought a new computerand were trying to set it upbefore its obsolete.

    Super- PFLOPS Massively parallelprocessors

  • 7/30/2019 f37 Book Intarch Pres Pt1

    88/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 88

    Figure 4.7 Exponential growth of supercomputer performance.

    computers

    19901980 2000 2010

    Superc

    omputerperforma

    nce

    Calendar year

    CrayX-MP

    Y-MP

    CM-2

    MFLOPS

    GFLOPS

    TFLOPS

    Vectorsupercomputers

    CM-5

    CM-5

    $240M MPPs

    $30M MPPs

    processors

    The Most Powerful Computers

  • 7/30/2019 f37 Book Intarch Pres Pt1

    89/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 89

    Figure 4.8 Milestones in the DOEs Accelerated Strategic ComputingInitiative (ASCI) program with extrapolation up to the PFLOPS level.

    p

    20001995 2005 2010

    Perform

    ance(TFLOPS)

    Calendar year

    ASCI Red

    ASCI Blue

    ASCI White

    1+ TFLOPS, 0.5 TB

    3+ TFLOPS, 1.5 TB

    10+ TFLOPS, 5 TB

    30+ TFLOPS, 10 TB

    100+ TFLOPS, 20 TB

    1

    10

    100

    1000 Plan Develop Use

    ASCI

    ASCI Purple

    ASCI Q

    Performance is Important, But It Isnt Everything

  • 7/30/2019 f37 Book Intarch Pres Pt1

    90/91

    Jan. 2011 Computer Architecture, Background and Motivation Slide 90

    Figure 25.1Trend incomputationalperformanceper watt of

    power usedin general-purposeprocessorsand DSPs.

    p y g

    19901980 2000 2010kIPS

    MIPS

    GIPS

    TIPS

    Performance

    Calendar year

    Absolute

    processor

    performance

    GP processor

    performanceper Watt

    DSP performance

    per Watt

    Roadmap for the Rest of the Book

  • 7/30/2019 f37 Book Intarch Pres Pt1

    91/91

    Ch. 5-8: A simple ISA,variations in ISA

    Ch. 9-12: ALU design

    Ch. 13-14: Data pathand control unit design

    Ch. 15-16: Pipelining

    and its limits

    Ch. 17-20: Memory(main, mass, cache, virtual)

    Ch. 21-24: I/O, buses,

    interrupts, interfacing

    Fasten your seatbeltsas we begin our ride!

    Ch. 25-28: Vector andparallel processing