Computer Architecture Fundamentals

8/8/2019 Computer Architecture Fundamentals

1/74

Chapter 1 - Fundamentals 1

Computer Architecture

Chapter 1Fundamentals


2/74


Introduction

1.1 Introduction

1.2 The Task of a Computer Designer

1.3 Technology and Computer Usage Trends

1.4 Cost and Trends in Cost1.5 Measuring and Reporting Performance

1.6 Quantitative Principles of Computer Design

1.7 Putting It All Together: The Concept of Memory Hierarchy


3/74


Art andArchitecture

Whats the difference

between Art andArchitecture?

Lyonel Feininger,Marktkirche in Halle


4/74


Art and Architecture

Whats the difference between Art and Architecture?

Notre Dame

de Paris


5/74


Whats Computer Architecture?

The attributes of a [computing] system as seen by the

programmer, i.e., the conceptual structure and functionalbehavior, as distinct from the organization of the dataflows and controls the logic design, and the physicalimplementation.

Amdahl, Blaaw, and Brooks, 1964

SOFTWARESOFTWARE


6/74


Whats Computer Architecture?

1950s to 1960s: Computer Architecture Course

Computer Arithmetic.

1970s to mid 1980s: Computer Architecture CourseInstruction Set Design, especially ISA appropriate for

compilers. (What well do in Chapter 2) 1990s to 2000s: Computer Architecture Course

Design of CPU, memory system, I/O system,Multiprocessors. (All evolving at a tremendous rate!)


7/74


The Task of a

Computer Designer1.1 Introduction

1.2 The Task of a ComputerDesigner

1.3 Technology and ComputerUsage Trends

1.4 Cost and Trends in Cost

1.5 Measuring and ReportingPerformance

1.6 Quantitative Principles of

Computer Design1.7 Putting It All Together: The

Concept of MemoryHierarchy

Evaluate ExistingEvaluate Existing

Systems forSystems for

BottlenecksBottlenecks

Simulate NewSimulate New

Designs andDesigns andOrganizationsOrganizations

Implement NextImplement Next

Generation SystemGeneration System

TechnologyTrends

Benchmarks

Workloads

ImplementationComplexity


8/74


Technology and

Computer Usage Trends1.1 Introduction


1.3 Technology and Computer UsageTrends


1.5 Measuring and Reporting Performance

1.6 Quantitative Principles of ComputerDesign

1.7 Putting It All Together: The Concept ofMemory Hierarchy

Similarly, Computer Architecture is aboutworking within constraints:

What will the market buy? Cost/Performance

Tradeoffs in materials and processes

When building a Cathedral numerous

very practical considerations need tobe taken into account:

available materials

worker skills

willingness of the client to pay theprice.


9/74


TrendsGordon Moore (Founder of Intel) observed in 1965 that the number of

transistors that could be crammed on a chip doubles every year.

This has CONTINUED to be true since then.Transistors Per Chip

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1.E+08

1970 1975 1980 1985 1990 1995 2000 2005

4004

Power PC 601486

386

80286

8086

Pentium

Pentium Pro

Pentium II

Power PC G3

Pentium 3


10/74


TrendsProcessor performance, as measured by the SPEC benchmark has

also risen dramatically.

0

1000

2000

3000

4000

5000

87

88

89

90

91

92

93

94

95

96

97

98

99

2000

DEC Alpha 21264/600

DEC Alpha 5/500

DEC Alpha 4/266

DEC

AXP/

500Sun

-4/

260

IBM

RS/

6000

MIPS

M

2000

Alpha 6/833


11/74


TrendsMemory Capacity (and Cost) have changed dramatically in the last 20

years.

size

Year

1000

10000

100000

1000000

10000000

100000000

1000000000

1970 1975 1980 1985 1990 1995 2000

year size(Mb) cyc time

1980 0.0625 250 ns

1983 0.25 220 ns1986 1 190 ns

1989 4 165 ns

1992 16 145 ns

1996 64 120 ns2000 256 100 ns


12/74


TrendsBased on SPEED, the CPU has increased dramatically, but memory

and disk have increased only a little. This has led to dramaticchanged in architecture, Operating Systems, and Programmingpractices.

Capacity Speed (latency)

Logic 2x in 3 years 2x in 3 yearsDRAM 4x in 3 years 2x in 10 years

Disk 4x in 3 years 2x in 10 years


13/74


Measuring And

Reporting Performance1.1 Introduction


1.3 Technology and Computer UsageTrends





This section talks about:

1. Metrics how do we describein a numerical way theperformance of a computer?

2. What tools do we use to find

those metrics?


14/74


Metrics

Time to run the task (ExTime)Execution time, response time, latency

Tasks per day, hour, week, sec, ns (Performance)Throughput, bandwidth

Plane

Boeing 747

BAD/SudConcodre

Speed

610 mph

1350 mph

DC to Paris

6.5 hours

3 hours

Passengers

470

132

Throughput(pmph)

286,700

178,200


15/74


Metrics - Comparisons

"X is n times faster than Y" means

ExTime(Y) Performance(X)

--------- = ---------------

ExTime(X) Performance(Y)

Speed of Concorde vs. Boeing 747

Throughput of Boeing 747 vs. Concorde


16/74


Metrics - ComparisonsPat has developed a new product, "rabbit" about which she wishes to determine

performance. There is special interest in comparing the new product, rabbit to theold product, turtle, since the product was rewritten for performance reasons. (Pathad used Performance Engineering techniques and thus knew that rabbit was"about twice as fast" as turtle.) The measurements showed:

Performance Comparisons

Product Transactions / second Seconds/ transaction Seconds to process transaction

Turtle 30 0.0333 3

Rabbit 60 0.0166 1

Which of the following statements reflect the performance comparison of rabbit andturtle?

o Rabbit is 100% faster than turtle.

o Rabbit is twice as fast as turtle.

o Rabbit takes 1/2 as long as turtle.

o Rabbit takes 1/3 as long as turtle.

o Rabbit takes 100% less time than turtle.

o Rabbit takes 200% less time than turtle.

o Turtle is 50% as fast as rabbit.

o Turtle is 50% slower than rabbit.

o Turtle takes 200% longer than rabbit.

o Turtle takes 300% longer than rabbit.


17/74


Metrics - Throughput

Compiler

ProgrammingLanguage

Application

DatapathControl

Transistors Wires Pins

ISA

Function Units

(millions) of Instructions per second: MIPS(millions) of (FP) operations per second: MFLOP/s

Cycles per second (clock rate)

Megabytes per second

Answers per month

Operations per second


18/74


Methods For PredictingPerformance

Benchmarks, Traces, Mixes

Hardware: Cost, delay, area, power estimation

Simulation (many levels)

ISA, RT, Gate, Circuit Queuing Theory

Rules of Thumb

Fundamental Laws/Principles


19/74


Benchmarks

First Round 1989

10 programs yielding a single number (SPECmarks)

Second Round 1992

SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)

Compiler Flags unlimited. March 93 of DEC 4000 Model 610:

spice: unix.c:/def=(sysv,has_bcopy,bcopy(a,b,c)=memcpy(b,a,c)

wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200

nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas

Third Round 1995

new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floatingpoint)

benchmarks useful for 3 years

Single flag setting for all programs: SPECint_base95, SPECfp_base95

SPEC: System Performance EvaluationCooperative


20/74


BenchmarksCINT2000 (Integer Component of SPEC CPU2000):

Program Language What Is It

164.gzip C Compression

175.vpr C FPGA Circuit Placement and Routing

176.gcc C C Programming Language Compiler

181.mcf C Combinatorial Optimization

186.crafty C Game Playing: Chess197.parser C Word Processing

252.eon C++ Computer Visualization

253.perlbmk C PERL Programming Language

254.gap C Group Theory, Interpreter

255.vortex C Object-oriented Database

256.bzip2 C Compression

300.twolf C Place and Route Simulator

http://www.spec.org/osg/cpu2000/CINT2000/


21/74


BenchmarksCFP2000 (Floating Point Component of SPEC

CPU2000):Program Language What Is It

168.wupwise Fortran 77 Physics / Quantum Chromodynamics

171.swim Fortran 77 Shallow Water Modeling172.mgrid Fortran 77 Multi-grid Solver: 3D Potential Field

173.applu Fortran 77 Parabolic / Elliptic Differential Equations

177.mesa C 3-D Graphics Library

178.galgel Fortran 90 Computational Fluid Dynamics

179.art C Image Recognition / Neural Networks183.equake C Seismic Wave Propagation Simulation

187.facerec Fortran 90 Image Processing: Face Recognition

188.ammp C Computational Chemistry

189.lucas Fortran 90 Number Theory / Primality Testing

191.fma3d Fortran 90 Finite-element Crash Simulation200.sixtrack Fortran 77 High Energy Physics Accelerator Design

301.apsi Fortran 77 Meteorology: Pollutant Distribution

http://www.spec.org/osg/cpu2000/CFP2000/


22/74


Benchmarks Sample Results ForSpecINT2000

Base Base Base Peak Peak Peak

Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio

164.gzip 1400 277 505* 1400 270 518*

175.vpr 1400 419 334* 1400 417 336*

176.gcc 1100 275 399* 1100 272 405*

181.mcf 1800 621 290* 1800 619 291*

186.crafty 1000 191 522* 1000 191 523*

197.parser 1800 500 360* 1800 499 361*252.eon 1300 267 486* 1300 267 486*

253.perlbmk 1800 302 596* 1800 302 596*

254.gap 1100 249 442* 1100 248 443*

255.vortex 1900 268 710* 1900 264 719*

256.bzip2 1500 389 386* 1500 375 400*

300.twolf 3000 784 382* 3000 776 387*

SPECint_base2000 438

SPECint2000 442

http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc

Intel OR840(1 GHzPentium III processor)


23/74


Benchmarks

Performance Evaluation

For better or worse, benchmarks shape a field

Good products created when have: Good benchmarks

Good ways to summarize performance

Given sales is a function in part of performance relative to

competition, investment in improving product as reported byperformance summary

If benchmarks/summary inadequate, then choose betweenimproving product for real programs vs. improving product to get

more sales;Sales almost always wins!

Execution time is the measure of computer performance!


24/74


Benchmarks

Management would like to have one number.

Technical people want more:

1. They want to have evidence of reproducibility there should be enoughinformation so that you or someone else can repeat the experiment.

2. There should be consistency when doing the measurements multipletimes.

How to Summarize Performance

How would you report these results?

401101001Total Time (secs)

201001000Program P2 (secs)

20101Program P1 (secs)

Computer CComputer BComputer A


25/74


Quantitative Principles

of Computer Design1.1 Introduction


1.3 Technology and Computer Usage

Trends





Make the common case fast.Amdahls Law:

Relates total speedup of asystem to the speedup of some

portion of that system.


26/74


Amdahl's Law

Suppose that enhancement E accelerates a fraction Fof the task by a factor S, and the remainder of thetask is unaffected

Quantitative

Design

tEnhancemenWithoutePerformanc

tEnhancemenWithePerformanc

tEnhancemenWithTimeExecution

tEnhancemenWithoutTimeExecutionESpeedup

__

__

___

___)( ==

Speedup due to enhancement E:

This fraction enhanced


27/74


ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced

Speedupoverall =ExTimeold

ExTimenew

Speedupenhanced

=

1

(1 - Fractionenhanced) + Fractionenhanced

Speedupenhanced

Quantitative

Design

This fraction enhanced

ExTimeold ExTimenew

Amdahl's Law


28/74


Amdahl's Law

Floating point instructions improved to run 2X; but only10% of actual instructions are FP

Speedupoverall =1

0.95= 1.053

ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

Quantitative

Design


29/74


QuantitativeDesign

Instruction Frequency

Invest Resources where time is Spent!

CPI = (CPU Time * Clock Rate) / Instruction Count

= Cycles / Instruction Count

=

=n

i

ii ICPITimeCycleTimeCPU1

**__

==

n

i

ii FCPICPI1

* where CountnInstructioI

ii

F _=

Number ofinstructions oftype I.

Cycles Per

Instruction


30/74


Quantitative

Design

Base Machine (Reg / Reg)

Op Freq Cycles CPI(i) (% Time)

ALU 50% 1 .5 (33%)

Load 20% 2 .4 (27%)

Store 10% 2 .2 (13%)

Branch 20% 2 .4 (27%)

Total CPI 1.5

Suppose we have a machine where we can count the frequency with whichinstructions are executed. We also know how many cycles it takes foreach instruction type.

Cycles Per

Instruction

How do we get CPI(I)?How do we get %time?


31/74


Quantitative

Design

Locality of

Reference

Programs access a relatively small portion of the address space at

any instant of time.

There are two different types of locality:

Temporal Locality (locality in time): If an item is referenced, it will

tend to be referenced again soon (loops, reuse, etc.)

Spatial Locality (locality in space/location): If an item is referenced,items whose addresses are close by tend to be referenced soon(straight line code, array access, etc.)


32/74


The Concept of

Memory Hierarchy1.1 Introduction


1.3 Technology and Computer Usage

Trends1.4 Cost and Trends in Cost




Fast memory is expensive.

Slow memory is cheap.

The goal is to minimize theprice/performance for aparticular price point.


33/74


Memory Hierarchy

RegistersLevel 1cache

Level 2Cache

Memory Disk

OS/UserOSHardwareHardwareCompilerManagedBy

100500 - 1000500 - 10002000 - 500010,000

50,000Bandwidth(in MB/sec)

5,000,000nsec

150 nsec15 nsec3 nsec1 nsecAccessTime

> 5Gigabytes


34/74


Memory Hierarchy

Hit: data appears in some block in the upper level (example:Block X)

Hit Rate: the fraction of memory access found in the upper level Hit Time: Time to access the upper level which consists of

RAM access time + Time to determine hit/miss

Miss: data needs to be retrieve from a block in the lower level

(Block Y) Miss Rate = 1 - (Hit Rate)

Miss Penalty: Time to replace a block in the upper level +

Time to deliver the block the processor

Hit Time


35/74


Memory Hierarchy

RegistersLevel 1cache

Level 2Cache

Memory Disk

What is the cost of executing a program if: Stores are free (theres a write pipe) Loads are 20% of all instructions 80% of loads hit (are found) in the Level 1 cache 97 of loads hit in the Level 2 cache.


36/74


Wrap Up

1.1 Introduction


1.3 Technology and Computer Usage Trends



1.6 Quantitative Principles of Computer Design

1.7 Putting It All Together: The Concept of Memory Hierarchy


37/74


Computer Architecture

Chapter 2Instruction Sets


38/74


Introduction2.1 Introduction

2.2 Classifying Instruction Set Architectures

2.3 Memory Addressing

2.4 Operations in the Instruction Set

2.5 Type and Size of Operands

2.6 Encoding and Instruction Set

2.7 The Role of Compilers

2.8 The MIPS Architecture

Bonus


39/74


IntroductionThe Instruction Set Architecture is that portion of the machine visible

to the assembly level programmer or to the compiler writer.

1. What are the advantages and disadvantages of various

instruction set alternatives.2. How do languages and compilers affect ISA.

3. Use the DLX architecture as an example of a RISC architecture.

instruction set

software

hardware

Cl if i I i S


40/74


Classifying Instruction Set

Architectures

Classifications can be by:

1. Stack/accumulator/register

2. Number of memory operands.

3. Number of total operands.

2.1 Introduction




2.5 Type and Size of Operands2.6 Encoding and Instruction Set


2.8 The DLX Architecture

Instruction Set B i ISA


41/74


Instruction Set

ArchitecturesAccumulator:

1 address add A acc acc + mem[A]1+x address addx A acc acc + mem[A + x]

Stack:

0 address add tos tos + next

General Purpose Register:

2 address add A B EA(A) EA(A) + EA(B)3 address add A B C EA(A) EA(B) + EA(C)

Load/Store:

0 Memory load R1, Mem1

load R2, Mem2add R1, R2

1 Memory add R1, Mem2

Basic ISA

Classes

ALU Instructionscan have two orthree operands.

ALU Instructions can

have 0, 1, 2, 3 operands.Shown here are cases of0 and 1.

Instruction Set B i ISA


42/74


Instruction Set

Architectures

Basic ISA

Classes

Store C, R3Pop C

Add R3, R1, R2Store C, R1Store CAdd

Load R2, BAdd R1, BAdd BPush B

Load R1, ALoad R1, ALoad APush A

Register

(load-store)

Register

(Register-memory)

AccumulatorStack

The results of different address classes is easiest to see with the examples here,all of which implement the sequences for C = A + B.

Registers are the class that won out. The more registers on the CPU, the better.

Instruction Set


43/74


Instruction Set

Architectures

Intel 80x86

Integer Registers

Condition CodesEflags

Instruction CounterEIPPC

Data Seg. 3GS

Data Seg. 2FS

Extra Data Segment PointerES

Data Segment PointerDS

Stack Segment PointerSS

Code Segment PointerCS

Index RegisterEDIGPR7

Index RegisterESIGPR6

Base Pointer for base of stack seg.EBPGPR5

Stack PointerESPGPR4

Base Address RegisterEBXGPR3

Data Register; multiply, divideEDXGPR2

Count register, string, loopECXGPR1

AccumulatorEAXGPR0


44/74


Memory Addressing

Sections Include:

Interpreting Memory Addresses

Addressing Modes

Displacement Address Mode

Immediate Address Mode

2.1 Introduction








Memory I t ti M


45/74


Memory

Addressing

What object is accessed as a function of the address and length?

Objects have byte addresses an address refers to the number of bytescounted from the beginning of memory.

Little Endian puts the byte whose address is xx00 at the leastsignificant position in the word.

Big Endian puts the byte whose address is xx00 at the most significantposition in the word.

Alignment data must be aligned on a boundary equal to its size.

Misalignment typically results in an alignment fault that must behandled by the Operating System.

Interpreting Memory

Addresses

Memory Add i g


46/74


Memory

Addressing

Addressing

ModesThis table shows the most common modes. A more complete set is in

Figure 2.6

Used for static data.R[R4]


47/74


Addressing Modes

R[1]R[1]+M[100+R[2]+R[3]*d]add r1, 100(r2)[r3]Scaled

R[2]R[2] dR[1]R[1]+M[R[2]]

add r1, (r2)Autodecrement

R[1]R[1]+M[R[2]]R[2]R[2]+d

add r1, (r2)+Autoincrement

R[1]R[1]+M[M[R[3]]]add r1, @(r3)Memory indirectR[1]

R[1]+M[1001]add r1, (1001)Direct/Absolute

R[3]R[3]+M[R[1]+R[2]]add r3, (r1+r2)Indexed

R[4]R[4]+M[R[1]]add r4, (r1)Register indirect

R[4]R[4]+M[100+R[1]]add r4, 100(r1)Displacement

R[4]R[4]+3add r4, #3Immediate

R[4]R[4]+R[3]add r4, r3Register

MeaningExampleMode

( ) memory access [ ] accessing a Register or Memory location

Memory Displacement


48/74


Memory

Addressing

Displacement

Addressing ModeHow big should the displacement be?

For addresses that do fit in displacement size:Add R4, 10000 (R0)

For addresses that dont fit in displacement size, the compilermust do the following:

Load R1, addressAdd R4, 0 (R1)

Depends on typical displaces as to how big this should be.

On both IA32 and DLX, the space allocated is 16 bits.

Memory Immediate Address


49/74


Memory

Addressing

Immediate Address

Mode

Used where we want to get to a numerical value in aninstruction.

So how would you get a 32 bit value into a register?

At highlevel:

a = b + 3;

if ( a > 17 )

goto Addr

At Assembler level:

Load R2, 3Add R0, R1, R2

Load R2, 17CMPBGT R1, R2

Load R1, Address

Jump (R1)

Operations In The


50/74


Operations In The

Instruction Set

Sections Include:

Detailed information about typesof instructions.

Instructions for Control Flow(conditional branches, jumps)

2.1 Introduction








Operations In The


51/74


Operations In The

Instruction SetArithmetic and logical - and, add

Data transfer - move, loadControl - branch, jump, call

System - system call, traps

Floating point - add, mul, div, sqrt

Decimal - add, convertString - move, compare

Multimedia - 2D, 3D? e.g., Intel MMX and Sun VIS

Operator Types

Operations In The Control


52/74


Operations In The

Instruction Set

Control Instructions Issues:

taken or not

where is the target

link return address

save or restore

Instructions that change the PC:

(conditional) branches, (unconditional) jumps

function calls, function returns

system calls, system returns

Control

InstructionsConditional branches are 20%

of all instructions!!



53/74


Operations In The

Instruction SetThere are numerous tradeoffs:

Compare and branch

+ no extra compare, no state passedbetween instructions

-- requires ALU op, restricts codescheduling opportunities

Implicitly set condition codes - Z, N, V, C

+ can be set ``for free''

-- constrains code reordering, extrastate to save/restore

Explicitly set condition codes+ can be set ``for free'', decouples

branch/fetch from pipeline

-- extra state to save/restore

Control

Instructions

There are numerous tradeoffs:

condition in general-purpose register

+ no special state but uses up a register

-- branch condition separate from branchlogic in pipeline

some data for MIPS

> 80% branches use immediate data, >80% of those zero

50% branches use == 0 or 0

compromise in MIPS

branch==0, branch0

compare instructions for all othercompares



54/74


Operations In The

Instruction SetLink Return Address:

implicit register - many recentarchitectures use this

+ fast, simple

-- s/w save register before next call,surprise traps?

explicit register+ may avoid saving register

-- register must be specified

processor stack

+ recursion direct

-- complex instructions

Control

Instructions

Save or restore state:

What state?function calls: registers

system calls: registers, flags, PC, PSW, etc

Hardware need not save registersCaller can save registers in use

Callee save registers it will useHardware register save

IBM STM, VAX CALLS

Faster?

Many recent architectures do no registersaving

Or do implicit register saving with registerwindows (SPARC)

Type And Size of Operands


55/74


Type And Size of Operands

The type of the operand is usuallyencoded in the Opcode a LDWimplies loading of a word.

Common sizes are:

Character (1 byte)

Half word (16 bits)

Word (32 bits)

Single Precision Floating Point (1 Word)

Double Precision Floating Point (2 Words)Integers are twos complement binary.

Floating point is IEEE 754.

Some languages (like COBOL) usepacked decimal.

2.1 Introduction

2.2 Classifying Instruction SetArchitectures







Encoding And Instruction Set


56/74


Encoding And Instruction Set

This section has to do with how anassembly level instruction isencoded into binary.

Ultimately, its the binary that isread and interpreted by themachine.

2.1 Introduction








We will be using the Intel instruction set which is defined at:http://developer.intel.com/design/Pentium4/manuals.

Volume 2 has the instruction set.

Encoding And 80x86 Instruction


57/74


Encoding And

Instruction Set

80x86 Instruction

Encodingfor ( index = 0; index < iterations; index++ )

0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0

0040D3B6 EB 09 jmp main+0D1h (0040d3c1)

0040D3B8 8B 4D F0 mov ecx,dword ptr [ebp-10h]

0040D3BB 83 C1 01 add ecx,1

0040D3BE 89 4D F0 mov dword ptr [ebp-10h],ecx

0040D3C1 8B 55 F0 mov edx,dword ptr [ebp-10h]

0040D3C4 3B 55 F8 cmp edx,dword ptr [ebp-8]

0040D3C7 7D 15 jge main+0EEh (0040d3de)long_temp = (*alignment + long_temp) % 47;

0040D3C9 8B 45 F4 mov eax,dword ptr [ebp-0Ch]

0040D3CC 8B 00 mov eax,dword ptr [eax]

0040D3CE 03 45 EC add eax,dword ptr [ebp-14h]

0040D3D1 99 cdq

0040D3D2 B9 2F 00 00 00 mov ecx,2Fh

0040D3D7 F7 F9 idiv eax,ecx

0040D3D9 89 55 EC mov dword ptr [ebp-14h],edx

0040D3DC EB DA jmp main+0C8h (0040d3b8)

Heres somesample code thats

been disassembled.

It was compiledwith the debugger

option so is notoptimized.

This codewas

producedusing Visual

Studio



58/74


Encoding And

Instruction Set

80x86 Instruction


00401000 8B 0D 40 54 40 00 mov ecx,dword ptr ds:[405440h]

00401006 33 D2 xor edx,edx

00401008 85 C9 test ecx,ecx

0040100A 7E 14 jle 00401020

0040100C 56 push esi

0040100D 57 push edi

0040100E 8B F1 mov esi,ecx

long_temp = (*alignment + long_temp) % 47;

00401010 8D 04 11 lea eax,[ecx+edx]

00401013 BF 2F 00 00 00 mov edi,2Fh

00401018 99 cdq

00401019 F7 FF idiv eax,edi

0040101B 4E dec esi0040101C 75 F2 jne 00401010

0040101E 5F pop edi

0040101F 5E pop esi

00401020 C3 ret


been disassembled.

It was compiledwith optimization

This codewas

producedusing Visual

Studio



59/74

Chapter 1 - Fundamentals59

Encoding And

Instruction Set

80x86 Instruction


0x804852f : add $0x10,%esp

0x8048532 : lea 0xfffffff8(%ebp),%edx0x8048535 : test %esi,%esi

0x8048537 : jle 0x8048543

0x8048539 : mov %esi,%eax

0x804853b : nop

0x804853c : lea 0x0(%esi,1),%esi

long_temp = (*alignment + long_temp) % 47;

0x8048540 : dec %eax

0x8048541 : jne 0x8048540

0x8048543 : add $0xfffffff4,%esp


been disassembled.

It was compiledwith optimization

This codewas

producedusing gccand gdb.

For details,see Lab 2.1

Note that the representation ofthe code is dependent on thecompiler/debugger!



60/74


Encoding And

Instruction Set Encoding

RegADD Disp.

34 8

postbyteSHL6 8

V/w2

Disp.8

TEST

7

W

1

postbyte

8

Immediate

8

W

1 A Morass of disjoint encoding!!

This is Figure D.8



61/74


Encoding And


CALLF Offset Segment Number

CondJE Disp.

44

8 16 16

8

postbyteMOV6 8

D/w2

Disp.8

PUSH

5

Reg

3



62/74


g


C7 /0 MOV r/m32,imm32 Move an immediate 32 bit data item to a register or to memory.

Copies the second operand (source operand) to the first operand (destination operand).The source operand can be an immediate value, general purpose register, segment

register, or memory location. Both operands must be the same size, which can be abyte, a word, or a doubleword.

In our case, because of the C7 Opcode, we know its a sub-flavor of MOV putting animmediate value into memory.

Heres the instruction that we had several pages ago:

0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0

Is described in:

http://developer.intel.com/design/pentium4/manuals/245471.htm

(I found it on page 479, but this is obviously version dependent.)

C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0Op Code for

Mov Immediate32 bits of 0.

Target Register+ use next 8 bits as

displacement.

This is-10 hex.

The Role of Compilers


63/74


p

Compiler goals:

All correct programs executecorrectly

Most compiled programsexecute fast (optimizations)

Fast compilation Debugging support

2.1 Introduction








The Role ofSteps In Compilation


64/74


CompilersSteps In Compilation

Parsing --> intermediate representation

Jump Optimization

Loop Optimizations

Register Allocation

Code Generation --> assembly code

Common Sub-Expression

Procedure in-liningConstant Propagation

Strength Reduction

Pipeline Scheduling

The Role of


65/74


Compilers Steps In Compilation

Not MeasuredDepends on Machine KnowledgeMachine Dependent

42%Across A BranchGlobal

40%Within Straight Line CodeLocal

Not MeasuredAt or near the source level;machine-independent

High Level

% of the total number of

optimizingtransformations

ExplanationOptimization

Name

The Role ofWhat compiler writers want:


66/74


CompilersWhat compiler writers want:

regularity

orthogonality

composability

Compilers perform a giant case

analysis too many choices make it hard

Orthogonal instruction sets

operation, addressing mode, data

type

One solution or all possible solutions

2 branch conditions - eq, lt

or all six - eq, ne, lt, gt, le, ge

not 3 or 4

There are advantages to having

instructions that are primitives.

Let the compiler put the instructions

together to make more complex

sequences.

The MIPS Architecture


67/74


MIPS is very RISC oriented.

MIPS will be used for manyexamples throughout thecourse.

2.1 Introduction







2.8 The MIPS Architecture

The MIPS MIPS Characteristics


68/74


ArchitectureTheres MIPS 32 that we learned inCS140

32-bit byte addresses aligned

Load/store - only displacementaddressing

Standard datatypes

3 fixed length formats

32 32-bit GPRs (r0 = 0)

16 64-bit (32 32-bit) FPRs

FP status register

No Condition Codes

Data transfer

load/store word, load/store

byte/halfword signed?

load/store FP single/double

moves between GPRs and FPRs

ALU

add/subtract signed? immediate?

multiply/divide signed? and,or,xor immediate?, shifts: ll, rl,

ra immediate?

sets immediate?

Theres MIPS 64 the current arch.

Standard datatypes4 fixed length formats (8,16,32,64)

32 64-bit GPRs (r0 = 0)

64 64-bit FPRs

Addressing Modes Immediate

Displacement

(Register Mode used only for ALU)

The MIPS MIPS Characteristics


69/74


Architecture

Control

branches == 0, 0 conditional branch testing FP bit

jump, jump register

jump & link, jump & link register

trap, return-from-exception

Floating Point

add/sub/mul/div

single/double

fp converts, fp set

The MIPS The MIPS Encoding


70/74


Architecture

Op

31 26 01516202125

Rs1 Rd immediate

Op

31 26 025

Op

31 26 01516202125

Rs1 Rs2

target

Rd Opx

Register-Register

561011

Register-Immediate

Op

31 26 01516202125

Rs1 Rs2/Opx immediate

Branch

Jump / Call

RISC versus CISC


71/74


BONUS

combines 3 features

architecture implementation

compilers and OS

argues that

implementation effects are second order

compilers are similar

RISCs are better than CISCs: fair comparison?

RISC versus CISC


72/74


BONUS

RISC factor: {CPI VAX * Instr VAX }/ {CPI MIPS * Instr MIPS }

Benchmark Instruction CPI CPI CPI RISC

Ratio MIPS VAX Ratio factor

li 1.6 1.1 6.5 6.0 3.7

eqntott 1.1 1.3 4.4 3.5 3.3

fpppp 2.9 1.5 15.2 10.5 2.7

tomcatv 2.9 2.1 17.5 8.2 2.9

RISC versus CISC


73/74


BONUSCompensating factors

Increase VAX CPI but decrease

VAX instruction count Increase MIPS instruction count

e.g. 1: loads/stores versusoperand specifiers

e.g. 2: necessary complex

instructions: loop branches

Factors favoring VAX

Big immediate values

Not-taken branches incur no

delay

Factors favoring MIPS

Operand specifier decoding

Number of registers Separate floating point unit

Simple branches/jumps (lowerlatency)

No complex instructions

Instruction scheduling

Translation buffer

Branch displacement size

Wrapup


74/74


p p

2.1 Introduction





2.6 Encoding and Instruction Set2.7 The Role of Compilers


Bonus

Documents

Computer Architecture Fundamentals