Computer Architecture Fundamentals

Embed Size (px)

Citation preview

  • 8/8/2019 Computer Architecture Fundamentals

    1/74

    Chapter 1 - Fundamentals 1

    Computer Architecture

    Chapter 1Fundamentals

  • 8/8/2019 Computer Architecture Fundamentals

    2/74

    Chapter 1 - Fundamentals 2

    Introduction

    1.1 Introduction

    1.2 The Task of a Computer Designer

    1.3 Technology and Computer Usage Trends

    1.4 Cost and Trends in Cost1.5 Measuring and Reporting Performance

    1.6 Quantitative Principles of Computer Design

    1.7 Putting It All Together: The Concept of Memory Hierarchy

  • 8/8/2019 Computer Architecture Fundamentals

    3/74

    Chapter 1 - Fundamentals 3

    Art andArchitecture

    Whats the difference

    between Art andArchitecture?

    Lyonel Feininger,Marktkirche in Halle

  • 8/8/2019 Computer Architecture Fundamentals

    4/74

    Chapter 1 - Fundamentals 4

    Art and Architecture

    Whats the difference between Art and Architecture?

    Notre Dame

    de Paris

  • 8/8/2019 Computer Architecture Fundamentals

    5/74

    Chapter 1 - Fundamentals 5

    Whats Computer Architecture?

    The attributes of a [computing] system as seen by the

    programmer, i.e., the conceptual structure and functionalbehavior, as distinct from the organization of the dataflows and controls the logic design, and the physicalimplementation.

    Amdahl, Blaaw, and Brooks, 1964

    SOFTWARESOFTWARE

  • 8/8/2019 Computer Architecture Fundamentals

    6/74

    Chapter 1 - Fundamentals 6

    Whats Computer Architecture?

    1950s to 1960s: Computer Architecture Course

    Computer Arithmetic.

    1970s to mid 1980s: Computer Architecture CourseInstruction Set Design, especially ISA appropriate for

    compilers. (What well do in Chapter 2) 1990s to 2000s: Computer Architecture Course

    Design of CPU, memory system, I/O system,Multiprocessors. (All evolving at a tremendous rate!)

  • 8/8/2019 Computer Architecture Fundamentals

    7/74

    Chapter 1 - Fundamentals 7

    The Task of a

    Computer Designer1.1 Introduction

    1.2 The Task of a ComputerDesigner

    1.3 Technology and ComputerUsage Trends

    1.4 Cost and Trends in Cost

    1.5 Measuring and ReportingPerformance

    1.6 Quantitative Principles of

    Computer Design1.7 Putting It All Together: The

    Concept of MemoryHierarchy

    Evaluate ExistingEvaluate Existing

    Systems forSystems for

    BottlenecksBottlenecks

    Simulate NewSimulate New

    Designs andDesigns andOrganizationsOrganizations

    Implement NextImplement Next

    Generation SystemGeneration System

    TechnologyTrends

    Benchmarks

    Workloads

    ImplementationComplexity

  • 8/8/2019 Computer Architecture Fundamentals

    8/74

    Chapter 1 - Fundamentals 8

    Technology and

    Computer Usage Trends1.1 Introduction

    1.2 The Task of a Computer Designer

    1.3 Technology and Computer UsageTrends

    1.4 Cost and Trends in Cost

    1.5 Measuring and Reporting Performance

    1.6 Quantitative Principles of ComputerDesign

    1.7 Putting It All Together: The Concept ofMemory Hierarchy

    Similarly, Computer Architecture is aboutworking within constraints:

    What will the market buy? Cost/Performance

    Tradeoffs in materials and processes

    When building a Cathedral numerous

    very practical considerations need tobe taken into account:

    available materials

    worker skills

    willingness of the client to pay theprice.

  • 8/8/2019 Computer Architecture Fundamentals

    9/74

    Chapter 1 - Fundamentals 9

    TrendsGordon Moore (Founder of Intel) observed in 1965 that the number of

    transistors that could be crammed on a chip doubles every year.

    This has CONTINUED to be true since then.Transistors Per Chip

    1.E+03

    1.E+04

    1.E+05

    1.E+06

    1.E+07

    1.E+08

    1970 1975 1980 1985 1990 1995 2000 2005

    4004

    Power PC 601486

    386

    80286

    8086

    Pentium

    Pentium Pro

    Pentium II

    Power PC G3

    Pentium 3

  • 8/8/2019 Computer Architecture Fundamentals

    10/74

    Chapter 1 - Fundamentals 10

    TrendsProcessor performance, as measured by the SPEC benchmark has

    also risen dramatically.

    0

    1000

    2000

    3000

    4000

    5000

    87

    88

    89

    90

    91

    92

    93

    94

    95

    96

    97

    98

    99

    2000

    DEC Alpha 21264/600

    DEC Alpha 5/500

    DEC Alpha 4/266

    DEC

    AXP/

    500Sun

    -4/

    260

    IBM

    RS/

    6000

    MIPS

    M

    2000

    Alpha 6/833

  • 8/8/2019 Computer Architecture Fundamentals

    11/74

    Chapter 1 - Fundamentals 11

    TrendsMemory Capacity (and Cost) have changed dramatically in the last 20

    years.

    size

    Year

    1000

    10000

    100000

    1000000

    10000000

    100000000

    1000000000

    1970 1975 1980 1985 1990 1995 2000

    year size(Mb) cyc time

    1980 0.0625 250 ns

    1983 0.25 220 ns1986 1 190 ns

    1989 4 165 ns

    1992 16 145 ns

    1996 64 120 ns2000 256 100 ns

  • 8/8/2019 Computer Architecture Fundamentals

    12/74

    Chapter 1 - Fundamentals 12

    TrendsBased on SPEED, the CPU has increased dramatically, but memory

    and disk have increased only a little. This has led to dramaticchanged in architecture, Operating Systems, and Programmingpractices.

    Capacity Speed (latency)

    Logic 2x in 3 years 2x in 3 yearsDRAM 4x in 3 years 2x in 10 years

    Disk 4x in 3 years 2x in 10 years

  • 8/8/2019 Computer Architecture Fundamentals

    13/74

    Chapter 1 - Fundamentals 13

    Measuring And

    Reporting Performance1.1 Introduction

    1.2 The Task of a Computer Designer

    1.3 Technology and Computer UsageTrends

    1.4 Cost and Trends in Cost

    1.5 Measuring and Reporting Performance

    1.6 Quantitative Principles of ComputerDesign

    1.7 Putting It All Together: The Concept ofMemory Hierarchy

    This section talks about:

    1. Metrics how do we describein a numerical way theperformance of a computer?

    2. What tools do we use to find

    those metrics?

  • 8/8/2019 Computer Architecture Fundamentals

    14/74

    Chapter 1 - Fundamentals 14

    Metrics

    Time to run the task (ExTime)Execution time, response time, latency

    Tasks per day, hour, week, sec, ns (Performance)Throughput, bandwidth

    Plane

    Boeing 747

    BAD/SudConcodre

    Speed

    610 mph

    1350 mph

    DC to Paris

    6.5 hours

    3 hours

    Passengers

    470

    132

    Throughput(pmph)

    286,700

    178,200

  • 8/8/2019 Computer Architecture Fundamentals

    15/74

    Chapter 1 - Fundamentals 15

    Metrics - Comparisons

    "X is n times faster than Y" means

    ExTime(Y) Performance(X)

    --------- = ---------------

    ExTime(X) Performance(Y)

    Speed of Concorde vs. Boeing 747

    Throughput of Boeing 747 vs. Concorde

  • 8/8/2019 Computer Architecture Fundamentals

    16/74

    Chapter 1 - Fundamentals 16

    Metrics - ComparisonsPat has developed a new product, "rabbit" about which she wishes to determine

    performance. There is special interest in comparing the new product, rabbit to theold product, turtle, since the product was rewritten for performance reasons. (Pathad used Performance Engineering techniques and thus knew that rabbit was"about twice as fast" as turtle.) The measurements showed:

    Performance Comparisons

    Product Transactions / second Seconds/ transaction Seconds to process transaction

    Turtle 30 0.0333 3

    Rabbit 60 0.0166 1

    Which of the following statements reflect the performance comparison of rabbit andturtle?

    o Rabbit is 100% faster than turtle.

    o Rabbit is twice as fast as turtle.

    o Rabbit takes 1/2 as long as turtle.

    o Rabbit takes 1/3 as long as turtle.

    o Rabbit takes 100% less time than turtle.

    o Rabbit takes 200% less time than turtle.

    o Turtle is 50% as fast as rabbit.

    o Turtle is 50% slower than rabbit.

    o Turtle takes 200% longer than rabbit.

    o Turtle takes 300% longer than rabbit.

  • 8/8/2019 Computer Architecture Fundamentals

    17/74

    Chapter 1 - Fundamentals 17

    Metrics - Throughput

    Compiler

    ProgrammingLanguage

    Application

    DatapathControl

    Transistors Wires Pins

    ISA

    Function Units

    (millions) of Instructions per second: MIPS(millions) of (FP) operations per second: MFLOP/s

    Cycles per second (clock rate)

    Megabytes per second

    Answers per month

    Operations per second

  • 8/8/2019 Computer Architecture Fundamentals

    18/74

    Chapter 1 - Fundamentals 18

    Methods For PredictingPerformance

    Benchmarks, Traces, Mixes

    Hardware: Cost, delay, area, power estimation

    Simulation (many levels)

    ISA, RT, Gate, Circuit Queuing Theory

    Rules of Thumb

    Fundamental Laws/Principles

  • 8/8/2019 Computer Architecture Fundamentals

    19/74

    Chapter 1 - Fundamentals 19

    Benchmarks

    First Round 1989

    10 programs yielding a single number (SPECmarks)

    Second Round 1992

    SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)

    Compiler Flags unlimited. March 93 of DEC 4000 Model 610:

    spice: unix.c:/def=(sysv,has_bcopy,bcopy(a,b,c)=memcpy(b,a,c)

    wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200

    nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas

    Third Round 1995

    new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floatingpoint)

    benchmarks useful for 3 years

    Single flag setting for all programs: SPECint_base95, SPECfp_base95

    SPEC: System Performance EvaluationCooperative

  • 8/8/2019 Computer Architecture Fundamentals

    20/74

    Chapter 1 - Fundamentals 20

    BenchmarksCINT2000 (Integer Component of SPEC CPU2000):

    Program Language What Is It

    164.gzip C Compression

    175.vpr C FPGA Circuit Placement and Routing

    176.gcc C C Programming Language Compiler

    181.mcf C Combinatorial Optimization

    186.crafty C Game Playing: Chess197.parser C Word Processing

    252.eon C++ Computer Visualization

    253.perlbmk C PERL Programming Language

    254.gap C Group Theory, Interpreter

    255.vortex C Object-oriented Database

    256.bzip2 C Compression

    300.twolf C Place and Route Simulator

    http://www.spec.org/osg/cpu2000/CINT2000/

  • 8/8/2019 Computer Architecture Fundamentals

    21/74

    Chapter 1 - Fundamentals 21

    BenchmarksCFP2000 (Floating Point Component of SPEC

    CPU2000):Program Language What Is It

    168.wupwise Fortran 77 Physics / Quantum Chromodynamics

    171.swim Fortran 77 Shallow Water Modeling172.mgrid Fortran 77 Multi-grid Solver: 3D Potential Field

    173.applu Fortran 77 Parabolic / Elliptic Differential Equations

    177.mesa C 3-D Graphics Library

    178.galgel Fortran 90 Computational Fluid Dynamics

    179.art C Image Recognition / Neural Networks183.equake C Seismic Wave Propagation Simulation

    187.facerec Fortran 90 Image Processing: Face Recognition

    188.ammp C Computational Chemistry

    189.lucas Fortran 90 Number Theory / Primality Testing

    191.fma3d Fortran 90 Finite-element Crash Simulation200.sixtrack Fortran 77 High Energy Physics Accelerator Design

    301.apsi Fortran 77 Meteorology: Pollutant Distribution

    http://www.spec.org/osg/cpu2000/CFP2000/

  • 8/8/2019 Computer Architecture Fundamentals

    22/74

    Chapter 1 - Fundamentals 22

    Benchmarks Sample Results ForSpecINT2000

    Base Base Base Peak Peak Peak

    Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio

    164.gzip 1400 277 505* 1400 270 518*

    175.vpr 1400 419 334* 1400 417 336*

    176.gcc 1100 275 399* 1100 272 405*

    181.mcf 1800 621 290* 1800 619 291*

    186.crafty 1000 191 522* 1000 191 523*

    197.parser 1800 500 360* 1800 499 361*252.eon 1300 267 486* 1300 267 486*

    253.perlbmk 1800 302 596* 1800 302 596*

    254.gap 1100 249 442* 1100 248 443*

    255.vortex 1900 268 710* 1900 264 719*

    256.bzip2 1500 389 386* 1500 375 400*

    300.twolf 3000 784 382* 3000 776 387*

    SPECint_base2000 438

    SPECint2000 442

    http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc

    Intel OR840(1 GHzPentium III processor)

  • 8/8/2019 Computer Architecture Fundamentals

    23/74

    Chapter 1 - Fundamentals 23

    Benchmarks

    Performance Evaluation

    For better or worse, benchmarks shape a field

    Good products created when have: Good benchmarks

    Good ways to summarize performance

    Given sales is a function in part of performance relative to

    competition, investment in improving product as reported byperformance summary

    If benchmarks/summary inadequate, then choose betweenimproving product for real programs vs. improving product to get

    more sales;Sales almost always wins!

    Execution time is the measure of computer performance!

  • 8/8/2019 Computer Architecture Fundamentals

    24/74

    Chapter 1 - Fundamentals 24

    Benchmarks

    Management would like to have one number.

    Technical people want more:

    1. They want to have evidence of reproducibility there should be enoughinformation so that you or someone else can repeat the experiment.

    2. There should be consistency when doing the measurements multipletimes.

    How to Summarize Performance

    How would you report these results?

    401101001Total Time (secs)

    201001000Program P2 (secs)

    20101Program P1 (secs)

    Computer CComputer BComputer A

  • 8/8/2019 Computer Architecture Fundamentals

    25/74

    Chapter 1 - Fundamentals 25

    Quantitative Principles

    of Computer Design1.1 Introduction

    1.2 The Task of a Computer Designer

    1.3 Technology and Computer Usage

    Trends

    1.4 Cost and Trends in Cost

    1.5 Measuring and Reporting Performance

    1.6 Quantitative Principles of ComputerDesign

    1.7 Putting It All Together: The Concept ofMemory Hierarchy

    Make the common case fast.Amdahls Law:

    Relates total speedup of asystem to the speedup of some

    portion of that system.

  • 8/8/2019 Computer Architecture Fundamentals

    26/74

    Chapter 1 - Fundamentals 26

    Amdahl's Law

    Suppose that enhancement E accelerates a fraction Fof the task by a factor S, and the remainder of thetask is unaffected

    Quantitative

    Design

    tEnhancemenWithoutePerformanc

    tEnhancemenWithePerformanc

    tEnhancemenWithTimeExecution

    tEnhancemenWithoutTimeExecutionESpeedup

    __

    __

    ___

    ___)( ==

    Speedup due to enhancement E:

    This fraction enhanced

  • 8/8/2019 Computer Architecture Fundamentals

    27/74

    Chapter 1 - Fundamentals 27

    ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced

    Speedupoverall =ExTimeold

    ExTimenew

    Speedupenhanced

    =

    1

    (1 - Fractionenhanced) + Fractionenhanced

    Speedupenhanced

    Quantitative

    Design

    This fraction enhanced

    ExTimeold ExTimenew

    Amdahl's Law

  • 8/8/2019 Computer Architecture Fundamentals

    28/74

    Chapter 1 - Fundamentals 28

    Amdahl's Law

    Floating point instructions improved to run 2X; but only10% of actual instructions are FP

    Speedupoverall =1

    0.95= 1.053

    ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

    Quantitative

    Design

  • 8/8/2019 Computer Architecture Fundamentals

    29/74

    Chapter 1 - Fundamentals 29

    QuantitativeDesign

    Instruction Frequency

    Invest Resources where time is Spent!

    CPI = (CPU Time * Clock Rate) / Instruction Count

    = Cycles / Instruction Count

    =

    =n

    i

    ii ICPITimeCycleTimeCPU1

    **__

    ==

    n

    i

    ii FCPICPI1

    * where CountnInstructioI

    ii

    F _=

    Number ofinstructions oftype I.

    Cycles Per

    Instruction

  • 8/8/2019 Computer Architecture Fundamentals

    30/74

    Chapter 1 - Fundamentals 30

    Quantitative

    Design

    Base Machine (Reg / Reg)

    Op Freq Cycles CPI(i) (% Time)

    ALU 50% 1 .5 (33%)

    Load 20% 2 .4 (27%)

    Store 10% 2 .2 (13%)

    Branch 20% 2 .4 (27%)

    Total CPI 1.5

    Suppose we have a machine where we can count the frequency with whichinstructions are executed. We also know how many cycles it takes foreach instruction type.

    Cycles Per

    Instruction

    How do we get CPI(I)?How do we get %time?

  • 8/8/2019 Computer Architecture Fundamentals

    31/74

    Chapter 1 - Fundamentals 31

    Quantitative

    Design

    Locality of

    Reference

    Programs access a relatively small portion of the address space at

    any instant of time.

    There are two different types of locality:

    Temporal Locality (locality in time): If an item is referenced, it will

    tend to be referenced again soon (loops, reuse, etc.)

    Spatial Locality (locality in space/location): If an item is referenced,items whose addresses are close by tend to be referenced soon(straight line code, array access, etc.)

  • 8/8/2019 Computer Architecture Fundamentals

    32/74

    Chapter 1 - Fundamentals 32

    The Concept of

    Memory Hierarchy1.1 Introduction

    1.2 The Task of a Computer Designer

    1.3 Technology and Computer Usage

    Trends1.4 Cost and Trends in Cost

    1.5 Measuring and Reporting Performance

    1.6 Quantitative Principles of ComputerDesign

    1.7 Putting It All Together: The Concept ofMemory Hierarchy

    Fast memory is expensive.

    Slow memory is cheap.

    The goal is to minimize theprice/performance for aparticular price point.

  • 8/8/2019 Computer Architecture Fundamentals

    33/74

    Chapter 1 - Fundamentals 33

    Memory Hierarchy

    RegistersLevel 1cache

    Level 2Cache

    Memory Disk

    OS/UserOSHardwareHardwareCompilerManagedBy

    100500 - 1000500 - 10002000 - 500010,000

    50,000Bandwidth(in MB/sec)

    5,000,000nsec

    150 nsec15 nsec3 nsec1 nsecAccessTime

    > 5Gigabytes

  • 8/8/2019 Computer Architecture Fundamentals

    34/74

    Chapter 1 - Fundamentals 34

    Memory Hierarchy

    Hit: data appears in some block in the upper level (example:Block X)

    Hit Rate: the fraction of memory access found in the upper level Hit Time: Time to access the upper level which consists of

    RAM access time + Time to determine hit/miss

    Miss: data needs to be retrieve from a block in the lower level

    (Block Y) Miss Rate = 1 - (Hit Rate)

    Miss Penalty: Time to replace a block in the upper level +

    Time to deliver the block the processor

    Hit Time

  • 8/8/2019 Computer Architecture Fundamentals

    35/74

    Chapter 1 - Fundamentals 35

    Memory Hierarchy

    RegistersLevel 1cache

    Level 2Cache

    Memory Disk

    What is the cost of executing a program if: Stores are free (theres a write pipe) Loads are 20% of all instructions 80% of loads hit (are found) in the Level 1 cache 97 of loads hit in the Level 2 cache.

  • 8/8/2019 Computer Architecture Fundamentals

    36/74

    Chapter 1 - Fundamentals 36

    Wrap Up

    1.1 Introduction

    1.2 The Task of a Computer Designer

    1.3 Technology and Computer Usage Trends

    1.4 Cost and Trends in Cost

    1.5 Measuring and Reporting Performance

    1.6 Quantitative Principles of Computer Design

    1.7 Putting It All Together: The Concept of Memory Hierarchy

  • 8/8/2019 Computer Architecture Fundamentals

    37/74

    Chapter 1 - Fundamentals 37

    Computer Architecture

    Chapter 2Instruction Sets

  • 8/8/2019 Computer Architecture Fundamentals

    38/74

    Chapter 1 - Fundamentals 38

    Introduction2.1 Introduction

    2.2 Classifying Instruction Set Architectures

    2.3 Memory Addressing

    2.4 Operations in the Instruction Set

    2.5 Type and Size of Operands

    2.6 Encoding and Instruction Set

    2.7 The Role of Compilers

    2.8 The MIPS Architecture

    Bonus

  • 8/8/2019 Computer Architecture Fundamentals

    39/74

    Chapter 1 - Fundamentals 39

    IntroductionThe Instruction Set Architecture is that portion of the machine visible

    to the assembly level programmer or to the compiler writer.

    1. What are the advantages and disadvantages of various

    instruction set alternatives.2. How do languages and compilers affect ISA.

    3. Use the DLX architecture as an example of a RISC architecture.

    instruction set

    software

    hardware

    Cl if i I i S

  • 8/8/2019 Computer Architecture Fundamentals

    40/74

    Chapter 1 - Fundamentals 40

    Classifying Instruction Set

    Architectures

    Classifications can be by:

    1. Stack/accumulator/register

    2. Number of memory operands.

    3. Number of total operands.

    2.1 Introduction

    2.2 Classifying Instruction Set Architectures

    2.3 Memory Addressing

    2.4 Operations in the Instruction Set

    2.5 Type and Size of Operands2.6 Encoding and Instruction Set

    2.7 The Role of Compilers

    2.8 The DLX Architecture

    Instruction Set B i ISA

  • 8/8/2019 Computer Architecture Fundamentals

    41/74

    Chapter 1 - Fundamentals 41

    Instruction Set

    ArchitecturesAccumulator:

    1 address add A acc acc + mem[A]1+x address addx A acc acc + mem[A + x]

    Stack:

    0 address add tos tos + next

    General Purpose Register:

    2 address add A B EA(A) EA(A) + EA(B)3 address add A B C EA(A) EA(B) + EA(C)

    Load/Store:

    0 Memory load R1, Mem1

    load R2, Mem2add R1, R2

    1 Memory add R1, Mem2

    Basic ISA

    Classes

    ALU Instructionscan have two orthree operands.

    ALU Instructions can

    have 0, 1, 2, 3 operands.Shown here are cases of0 and 1.

    Instruction Set B i ISA

  • 8/8/2019 Computer Architecture Fundamentals

    42/74

    Chapter 1 - Fundamentals 42

    Instruction Set

    Architectures

    Basic ISA

    Classes

    Store C, R3Pop C

    Add R3, R1, R2Store C, R1Store CAdd

    Load R2, BAdd R1, BAdd BPush B

    Load R1, ALoad R1, ALoad APush A

    Register

    (load-store)

    Register

    (Register-memory)

    AccumulatorStack

    The results of different address classes is easiest to see with the examples here,all of which implement the sequences for C = A + B.

    Registers are the class that won out. The more registers on the CPU, the better.

    Instruction Set

  • 8/8/2019 Computer Architecture Fundamentals

    43/74

    Chapter 1 - Fundamentals 43

    Instruction Set

    Architectures

    Intel 80x86

    Integer Registers

    Condition CodesEflags

    Instruction CounterEIPPC

    Data Seg. 3GS

    Data Seg. 2FS

    Extra Data Segment PointerES

    Data Segment PointerDS

    Stack Segment PointerSS

    Code Segment PointerCS

    Index RegisterEDIGPR7

    Index RegisterESIGPR6

    Base Pointer for base of stack seg.EBPGPR5

    Stack PointerESPGPR4

    Base Address RegisterEBXGPR3

    Data Register; multiply, divideEDXGPR2

    Count register, string, loopECXGPR1

    AccumulatorEAXGPR0

  • 8/8/2019 Computer Architecture Fundamentals

    44/74

    Chapter 1 - Fundamentals 44

    Memory Addressing

    Sections Include:

    Interpreting Memory Addresses

    Addressing Modes

    Displacement Address Mode

    Immediate Address Mode

    2.1 Introduction

    2.2 Classifying Instruction Set Architectures

    2.3 Memory Addressing

    2.4 Operations in the Instruction Set

    2.5 Type and Size of Operands

    2.6 Encoding and Instruction Set

    2.7 The Role of Compilers

    2.8 The DLX Architecture

    Memory I t ti M

  • 8/8/2019 Computer Architecture Fundamentals

    45/74

    Chapter 1 - Fundamentals 45

    Memory

    Addressing

    What object is accessed as a function of the address and length?

    Objects have byte addresses an address refers to the number of bytescounted from the beginning of memory.

    Little Endian puts the byte whose address is xx00 at the leastsignificant position in the word.

    Big Endian puts the byte whose address is xx00 at the most significantposition in the word.

    Alignment data must be aligned on a boundary equal to its size.

    Misalignment typically results in an alignment fault that must behandled by the Operating System.

    Interpreting Memory

    Addresses

    Memory Add i g

  • 8/8/2019 Computer Architecture Fundamentals

    46/74

    Chapter 1 - Fundamentals 46

    Memory

    Addressing

    Addressing

    ModesThis table shows the most common modes. A more complete set is in

    Figure 2.6

    Used for static data.R[R4]

  • 8/8/2019 Computer Architecture Fundamentals

    47/74

    Chapter 1 - Fundamentals 47

    Addressing Modes

    R[1]R[1]+M[100+R[2]+R[3]*d]add r1, 100(r2)[r3]Scaled

    R[2]R[2] dR[1]R[1]+M[R[2]]

    add r1, (r2)Autodecrement

    R[1]R[1]+M[R[2]]R[2]R[2]+d

    add r1, (r2)+Autoincrement

    R[1]R[1]+M[M[R[3]]]add r1, @(r3)Memory indirectR[1]

    R[1]+M[1001]add r1, (1001)Direct/Absolute

    R[3]R[3]+M[R[1]+R[2]]add r3, (r1+r2)Indexed

    R[4]R[4]+M[R[1]]add r4, (r1)Register indirect

    R[4]R[4]+M[100+R[1]]add r4, 100(r1)Displacement

    R[4]R[4]+3add r4, #3Immediate

    R[4]R[4]+R[3]add r4, r3Register

    MeaningExampleMode

    ( ) memory access [ ] accessing a Register or Memory location

    Memory Displacement

  • 8/8/2019 Computer Architecture Fundamentals

    48/74

    Chapter 1 - Fundamentals 48

    Memory

    Addressing

    Displacement

    Addressing ModeHow big should the displacement be?

    For addresses that do fit in displacement size:Add R4, 10000 (R0)

    For addresses that dont fit in displacement size, the compilermust do the following:

    Load R1, addressAdd R4, 0 (R1)

    Depends on typical displaces as to how big this should be.

    On both IA32 and DLX, the space allocated is 16 bits.

    Memory Immediate Address

  • 8/8/2019 Computer Architecture Fundamentals

    49/74

    Chapter 1 - Fundamentals 49

    Memory

    Addressing

    Immediate Address

    Mode

    Used where we want to get to a numerical value in aninstruction.

    So how would you get a 32 bit value into a register?

    At highlevel:

    a = b + 3;

    if ( a > 17 )

    goto Addr

    At Assembler level:

    Load R2, 3Add R0, R1, R2

    Load R2, 17CMPBGT R1, R2

    Load R1, Address

    Jump (R1)

    Operations In The

  • 8/8/2019 Computer Architecture Fundamentals

    50/74

    Chapter 1 - Fundamentals 50

    Operations In The

    Instruction Set

    Sections Include:

    Detailed information about typesof instructions.

    Instructions for Control Flow(conditional branches, jumps)

    2.1 Introduction

    2.2 Classifying Instruction Set Architectures

    2.3 Memory Addressing

    2.4 Operations in the Instruction Set

    2.5 Type and Size of Operands

    2.6 Encoding and Instruction Set

    2.7 The Role of Compilers

    2.8 The DLX Architecture

    Operations In The

  • 8/8/2019 Computer Architecture Fundamentals

    51/74

    Chapter 1 - Fundamentals 51

    Operations In The

    Instruction SetArithmetic and logical - and, add

    Data transfer - move, loadControl - branch, jump, call

    System - system call, traps

    Floating point - add, mul, div, sqrt

    Decimal - add, convertString - move, compare

    Multimedia - 2D, 3D? e.g., Intel MMX and Sun VIS

    Operator Types

    Operations In The Control

  • 8/8/2019 Computer Architecture Fundamentals

    52/74

    Chapter 1 - Fundamentals 52

    Operations In The

    Instruction Set

    Control Instructions Issues:

    taken or not

    where is the target

    link return address

    save or restore

    Instructions that change the PC:

    (conditional) branches, (unconditional) jumps

    function calls, function returns

    system calls, system returns

    Control

    InstructionsConditional branches are 20%

    of all instructions!!

    Operations In The Control

  • 8/8/2019 Computer Architecture Fundamentals

    53/74

    Chapter 1 - Fundamentals 53

    Operations In The

    Instruction SetThere are numerous tradeoffs:

    Compare and branch

    + no extra compare, no state passedbetween instructions

    -- requires ALU op, restricts codescheduling opportunities

    Implicitly set condition codes - Z, N, V, C

    + can be set ``for free''

    -- constrains code reordering, extrastate to save/restore

    Explicitly set condition codes+ can be set ``for free'', decouples

    branch/fetch from pipeline

    -- extra state to save/restore

    Control

    Instructions

    There are numerous tradeoffs:

    condition in general-purpose register

    + no special state but uses up a register

    -- branch condition separate from branchlogic in pipeline

    some data for MIPS

    > 80% branches use immediate data, >80% of those zero

    50% branches use == 0 or 0

    compromise in MIPS

    branch==0, branch0

    compare instructions for all othercompares

    Operations In The Control

  • 8/8/2019 Computer Architecture Fundamentals

    54/74

    Chapter 1 - Fundamentals 54

    Operations In The

    Instruction SetLink Return Address:

    implicit register - many recentarchitectures use this

    + fast, simple

    -- s/w save register before next call,surprise traps?

    explicit register+ may avoid saving register

    -- register must be specified

    processor stack

    + recursion direct

    -- complex instructions

    Control

    Instructions

    Save or restore state:

    What state?function calls: registers

    system calls: registers, flags, PC, PSW, etc

    Hardware need not save registersCaller can save registers in use

    Callee save registers it will useHardware register save

    IBM STM, VAX CALLS

    Faster?

    Many recent architectures do no registersaving

    Or do implicit register saving with registerwindows (SPARC)

    Type And Size of Operands

  • 8/8/2019 Computer Architecture Fundamentals

    55/74

    Chapter 1 - Fundamentals 55

    Type And Size of Operands

    The type of the operand is usuallyencoded in the Opcode a LDWimplies loading of a word.

    Common sizes are:

    Character (1 byte)

    Half word (16 bits)

    Word (32 bits)

    Single Precision Floating Point (1 Word)

    Double Precision Floating Point (2 Words)Integers are twos complement binary.

    Floating point is IEEE 754.

    Some languages (like COBOL) usepacked decimal.

    2.1 Introduction

    2.2 Classifying Instruction SetArchitectures

    2.3 Memory Addressing

    2.4 Operations in the Instruction Set

    2.5 Type and Size of Operands

    2.6 Encoding and Instruction Set

    2.7 The Role of Compilers

    2.8 The DLX Architecture

    Encoding And Instruction Set

  • 8/8/2019 Computer Architecture Fundamentals

    56/74

    Chapter 1 - Fundamentals 56

    Encoding And Instruction Set

    This section has to do with how anassembly level instruction isencoded into binary.

    Ultimately, its the binary that isread and interpreted by themachine.

    2.1 Introduction

    2.2 Classifying Instruction Set Architectures

    2.3 Memory Addressing

    2.4 Operations in the Instruction Set

    2.5 Type and Size of Operands

    2.6 Encoding and Instruction Set

    2.7 The Role of Compilers

    2.8 The DLX Architecture

    We will be using the Intel instruction set which is defined at:http://developer.intel.com/design/Pentium4/manuals.

    Volume 2 has the instruction set.

    Encoding And 80x86 Instruction

  • 8/8/2019 Computer Architecture Fundamentals

    57/74

    Chapter 1 - Fundamentals 57

    Encoding And

    Instruction Set

    80x86 Instruction

    Encodingfor ( index = 0; index < iterations; index++ )

    0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0

    0040D3B6 EB 09 jmp main+0D1h (0040d3c1)

    0040D3B8 8B 4D F0 mov ecx,dword ptr [ebp-10h]

    0040D3BB 83 C1 01 add ecx,1

    0040D3BE 89 4D F0 mov dword ptr [ebp-10h],ecx

    0040D3C1 8B 55 F0 mov edx,dword ptr [ebp-10h]

    0040D3C4 3B 55 F8 cmp edx,dword ptr [ebp-8]

    0040D3C7 7D 15 jge main+0EEh (0040d3de)long_temp = (*alignment + long_temp) % 47;

    0040D3C9 8B 45 F4 mov eax,dword ptr [ebp-0Ch]

    0040D3CC 8B 00 mov eax,dword ptr [eax]

    0040D3CE 03 45 EC add eax,dword ptr [ebp-14h]

    0040D3D1 99 cdq

    0040D3D2 B9 2F 00 00 00 mov ecx,2Fh

    0040D3D7 F7 F9 idiv eax,ecx

    0040D3D9 89 55 EC mov dword ptr [ebp-14h],edx

    0040D3DC EB DA jmp main+0C8h (0040d3b8)

    Heres somesample code thats

    been disassembled.

    It was compiledwith the debugger

    option so is notoptimized.

    This codewas

    producedusing Visual

    Studio

    Encoding And 80x86 Instruction

  • 8/8/2019 Computer Architecture Fundamentals

    58/74

    Chapter 1 - Fundamentals 58

    Encoding And

    Instruction Set

    80x86 Instruction

    Encodingfor ( index = 0; index < iterations; index++ )

    00401000 8B 0D 40 54 40 00 mov ecx,dword ptr ds:[405440h]

    00401006 33 D2 xor edx,edx

    00401008 85 C9 test ecx,ecx

    0040100A 7E 14 jle 00401020

    0040100C 56 push esi

    0040100D 57 push edi

    0040100E 8B F1 mov esi,ecx

    long_temp = (*alignment + long_temp) % 47;

    00401010 8D 04 11 lea eax,[ecx+edx]

    00401013 BF 2F 00 00 00 mov edi,2Fh

    00401018 99 cdq

    00401019 F7 FF idiv eax,edi

    0040101B 4E dec esi0040101C 75 F2 jne 00401010

    0040101E 5F pop edi

    0040101F 5E pop esi

    00401020 C3 ret

    Heres somesample code thats

    been disassembled.

    It was compiledwith optimization

    This codewas

    producedusing Visual

    Studio

    Encoding And 80x86 Instruction

  • 8/8/2019 Computer Architecture Fundamentals

    59/74

    Chapter 1 - Fundamentals59

    Encoding And

    Instruction Set

    80x86 Instruction

    Encodingfor ( index = 0; index < iterations; index++ )

    0x804852f : add $0x10,%esp

    0x8048532 : lea 0xfffffff8(%ebp),%edx0x8048535 : test %esi,%esi

    0x8048537 : jle 0x8048543

    0x8048539 : mov %esi,%eax

    0x804853b : nop

    0x804853c : lea 0x0(%esi,1),%esi

    long_temp = (*alignment + long_temp) % 47;

    0x8048540 : dec %eax

    0x8048541 : jne 0x8048540

    0x8048543 : add $0xfffffff4,%esp

    Heres somesample code thats

    been disassembled.

    It was compiledwith optimization

    This codewas

    producedusing gccand gdb.

    For details,see Lab 2.1

    Note that the representation ofthe code is dependent on thecompiler/debugger!

    Encoding And 80x86 Instruction

  • 8/8/2019 Computer Architecture Fundamentals

    60/74

    Chapter 1 - Fundamentals 60

    Encoding And

    Instruction Set Encoding

    RegADD Disp.

    34 8

    postbyteSHL6 8

    V/w2

    Disp.8

    TEST

    7

    W

    1

    postbyte

    8

    Immediate

    8

    W

    1 A Morass of disjoint encoding!!

    This is Figure D.8

    Encoding And 80x86 Instruction

  • 8/8/2019 Computer Architecture Fundamentals

    61/74

    Chapter 1 - Fundamentals 61

    Encoding And

    Instruction Set Encoding

    CALLF Offset Segment Number

    CondJE Disp.

    44

    8 16 16

    8

    postbyteMOV6 8

    D/w2

    Disp.8

    PUSH

    5

    Reg

    3

    Encoding And 80x86 Instruction

  • 8/8/2019 Computer Architecture Fundamentals

    62/74

    Chapter 1 - Fundamentals 62

    g

    Instruction Set Encoding

    C7 /0 MOV r/m32,imm32 Move an immediate 32 bit data item to a register or to memory.

    Copies the second operand (source operand) to the first operand (destination operand).The source operand can be an immediate value, general purpose register, segment

    register, or memory location. Both operands must be the same size, which can be abyte, a word, or a doubleword.

    In our case, because of the C7 Opcode, we know its a sub-flavor of MOV putting animmediate value into memory.

    Heres the instruction that we had several pages ago:

    0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0

    Is described in:

    http://developer.intel.com/design/pentium4/manuals/245471.htm

    (I found it on page 479, but this is obviously version dependent.)

    C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0Op Code for

    Mov Immediate32 bits of 0.

    Target Register+ use next 8 bits as

    displacement.

    This is-10 hex.

    The Role of Compilers

  • 8/8/2019 Computer Architecture Fundamentals

    63/74

    Chapter 1 - Fundamentals 63

    p

    Compiler goals:

    All correct programs executecorrectly

    Most compiled programsexecute fast (optimizations)

    Fast compilation Debugging support

    2.1 Introduction

    2.2 Classifying Instruction Set Architectures

    2.3 Memory Addressing

    2.4 Operations in the Instruction Set

    2.5 Type and Size of Operands

    2.6 Encoding and Instruction Set

    2.7 The Role of Compilers

    2.8 The DLX Architecture

    The Role ofSteps In Compilation

  • 8/8/2019 Computer Architecture Fundamentals

    64/74

    Chapter 1 - Fundamentals 64

    CompilersSteps In Compilation

    Parsing --> intermediate representation

    Jump Optimization

    Loop Optimizations

    Register Allocation

    Code Generation --> assembly code

    Common Sub-Expression

    Procedure in-liningConstant Propagation

    Strength Reduction

    Pipeline Scheduling

    The Role of

  • 8/8/2019 Computer Architecture Fundamentals

    65/74

    Chapter 1 - Fundamentals 65

    Compilers Steps In Compilation

    Not MeasuredDepends on Machine KnowledgeMachine Dependent

    42%Across A BranchGlobal

    40%Within Straight Line CodeLocal

    Not MeasuredAt or near the source level;machine-independent

    High Level

    % of the total number of

    optimizingtransformations

    ExplanationOptimization

    Name

    The Role ofWhat compiler writers want:

  • 8/8/2019 Computer Architecture Fundamentals

    66/74

    Chapter 1 - Fundamentals 66

    CompilersWhat compiler writers want:

    regularity

    orthogonality

    composability

    Compilers perform a giant case

    analysis too many choices make it hard

    Orthogonal instruction sets

    operation, addressing mode, data

    type

    One solution or all possible solutions

    2 branch conditions - eq, lt

    or all six - eq, ne, lt, gt, le, ge

    not 3 or 4

    There are advantages to having

    instructions that are primitives.

    Let the compiler put the instructions

    together to make more complex

    sequences.

    The MIPS Architecture

  • 8/8/2019 Computer Architecture Fundamentals

    67/74

    Chapter 1 - Fundamentals 67

    MIPS is very RISC oriented.

    MIPS will be used for manyexamples throughout thecourse.

    2.1 Introduction

    2.2 Classifying Instruction Set Architectures

    2.3 Memory Addressing

    2.4 Operations in the Instruction Set

    2.5 Type and Size of Operands

    2.6 Encoding and Instruction Set

    2.7 The Role of Compilers

    2.8 The MIPS Architecture

    The MIPS MIPS Characteristics

  • 8/8/2019 Computer Architecture Fundamentals

    68/74

    Chapter 1 - Fundamentals 68

    ArchitectureTheres MIPS 32 that we learned inCS140

    32-bit byte addresses aligned

    Load/store - only displacementaddressing

    Standard datatypes

    3 fixed length formats

    32 32-bit GPRs (r0 = 0)

    16 64-bit (32 32-bit) FPRs

    FP status register

    No Condition Codes

    Data transfer

    load/store word, load/store

    byte/halfword signed?

    load/store FP single/double

    moves between GPRs and FPRs

    ALU

    add/subtract signed? immediate?

    multiply/divide signed? and,or,xor immediate?, shifts: ll, rl,

    ra immediate?

    sets immediate?

    Theres MIPS 64 the current arch.

    Standard datatypes4 fixed length formats (8,16,32,64)

    32 64-bit GPRs (r0 = 0)

    64 64-bit FPRs

    Addressing Modes Immediate

    Displacement

    (Register Mode used only for ALU)

    The MIPS MIPS Characteristics

  • 8/8/2019 Computer Architecture Fundamentals

    69/74

    Chapter 1 - Fundamentals 69

    Architecture

    Control

    branches == 0, 0 conditional branch testing FP bit

    jump, jump register

    jump & link, jump & link register

    trap, return-from-exception

    Floating Point

    add/sub/mul/div

    single/double

    fp converts, fp set

    The MIPS The MIPS Encoding

  • 8/8/2019 Computer Architecture Fundamentals

    70/74

    Chapter 1 - Fundamentals 70

    Architecture

    Op

    31 26 01516202125

    Rs1 Rd immediate

    Op

    31 26 025

    Op

    31 26 01516202125

    Rs1 Rs2

    target

    Rd Opx

    Register-Register

    561011

    Register-Immediate

    Op

    31 26 01516202125

    Rs1 Rs2/Opx immediate

    Branch

    Jump / Call

    RISC versus CISC

  • 8/8/2019 Computer Architecture Fundamentals

    71/74

    Chapter 1 - Fundamentals 71

    BONUS

    combines 3 features

    architecture implementation

    compilers and OS

    argues that

    implementation effects are second order

    compilers are similar

    RISCs are better than CISCs: fair comparison?

    RISC versus CISC

  • 8/8/2019 Computer Architecture Fundamentals

    72/74

    Chapter 1 - Fundamentals 72

    BONUS

    RISC factor: {CPI VAX * Instr VAX }/ {CPI MIPS * Instr MIPS }

    Benchmark Instruction CPI CPI CPI RISC

    Ratio MIPS VAX Ratio factor

    li 1.6 1.1 6.5 6.0 3.7

    eqntott 1.1 1.3 4.4 3.5 3.3

    fpppp 2.9 1.5 15.2 10.5 2.7

    tomcatv 2.9 2.1 17.5 8.2 2.9

    RISC versus CISC

  • 8/8/2019 Computer Architecture Fundamentals

    73/74

    Chapter 1 - Fundamentals 73

    BONUSCompensating factors

    Increase VAX CPI but decrease

    VAX instruction count Increase MIPS instruction count

    e.g. 1: loads/stores versusoperand specifiers

    e.g. 2: necessary complex

    instructions: loop branches

    Factors favoring VAX

    Big immediate values

    Not-taken branches incur no

    delay

    Factors favoring MIPS

    Operand specifier decoding

    Number of registers Separate floating point unit

    Simple branches/jumps (lowerlatency)

    No complex instructions

    Instruction scheduling

    Translation buffer

    Branch displacement size

    Wrapup

  • 8/8/2019 Computer Architecture Fundamentals

    74/74

    Chapter 1 - Fundamentals 74

    p p

    2.1 Introduction

    2.2 Classifying Instruction Set Architectures

    2.3 Memory Addressing

    2.4 Operations in the Instruction Set

    2.5 Type and Size of Operands

    2.6 Encoding and Instruction Set2.7 The Role of Compilers

    2.8 The DLX Architecture

    Bonus