Upload
vishal-singh
View
260
Download
0
Embed Size (px)
Citation preview
8/8/2019 Computer Architecture Fundamentals
1/74
Chapter 1 - Fundamentals 1
Computer Architecture
Chapter 1Fundamentals
8/8/2019 Computer Architecture Fundamentals
2/74
Chapter 1 - Fundamentals 2
Introduction
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy
8/8/2019 Computer Architecture Fundamentals
3/74
Chapter 1 - Fundamentals 3
Art andArchitecture
Whats the difference
between Art andArchitecture?
Lyonel Feininger,Marktkirche in Halle
8/8/2019 Computer Architecture Fundamentals
4/74
Chapter 1 - Fundamentals 4
Art and Architecture
Whats the difference between Art and Architecture?
Notre Dame
de Paris
8/8/2019 Computer Architecture Fundamentals
5/74
Chapter 1 - Fundamentals 5
Whats Computer Architecture?
The attributes of a [computing] system as seen by the
programmer, i.e., the conceptual structure and functionalbehavior, as distinct from the organization of the dataflows and controls the logic design, and the physicalimplementation.
Amdahl, Blaaw, and Brooks, 1964
SOFTWARESOFTWARE
8/8/2019 Computer Architecture Fundamentals
6/74
Chapter 1 - Fundamentals 6
Whats Computer Architecture?
1950s to 1960s: Computer Architecture Course
Computer Arithmetic.
1970s to mid 1980s: Computer Architecture CourseInstruction Set Design, especially ISA appropriate for
compilers. (What well do in Chapter 2) 1990s to 2000s: Computer Architecture Course
Design of CPU, memory system, I/O system,Multiprocessors. (All evolving at a tremendous rate!)
8/8/2019 Computer Architecture Fundamentals
7/74
Chapter 1 - Fundamentals 7
The Task of a
Computer Designer1.1 Introduction
1.2 The Task of a ComputerDesigner
1.3 Technology and ComputerUsage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and ReportingPerformance
1.6 Quantitative Principles of
Computer Design1.7 Putting It All Together: The
Concept of MemoryHierarchy
Evaluate ExistingEvaluate Existing
Systems forSystems for
BottlenecksBottlenecks
Simulate NewSimulate New
Designs andDesigns andOrganizationsOrganizations
Implement NextImplement Next
Generation SystemGeneration System
TechnologyTrends
Benchmarks
Workloads
ImplementationComplexity
8/8/2019 Computer Architecture Fundamentals
8/74
Chapter 1 - Fundamentals 8
Technology and
Computer Usage Trends1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer UsageTrends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of ComputerDesign
1.7 Putting It All Together: The Concept ofMemory Hierarchy
Similarly, Computer Architecture is aboutworking within constraints:
What will the market buy? Cost/Performance
Tradeoffs in materials and processes
When building a Cathedral numerous
very practical considerations need tobe taken into account:
available materials
worker skills
willingness of the client to pay theprice.
8/8/2019 Computer Architecture Fundamentals
9/74
Chapter 1 - Fundamentals 9
TrendsGordon Moore (Founder of Intel) observed in 1965 that the number of
transistors that could be crammed on a chip doubles every year.
This has CONTINUED to be true since then.Transistors Per Chip
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
1.E+08
1970 1975 1980 1985 1990 1995 2000 2005
4004
Power PC 601486
386
80286
8086
Pentium
Pentium Pro
Pentium II
Power PC G3
Pentium 3
8/8/2019 Computer Architecture Fundamentals
10/74
Chapter 1 - Fundamentals 10
TrendsProcessor performance, as measured by the SPEC benchmark has
also risen dramatically.
0
1000
2000
3000
4000
5000
87
88
89
90
91
92
93
94
95
96
97
98
99
2000
DEC Alpha 21264/600
DEC Alpha 5/500
DEC Alpha 4/266
DEC
AXP/
500Sun
-4/
260
IBM
RS/
6000
MIPS
M
2000
Alpha 6/833
8/8/2019 Computer Architecture Fundamentals
11/74
Chapter 1 - Fundamentals 11
TrendsMemory Capacity (and Cost) have changed dramatically in the last 20
years.
size
Year
1000
10000
100000
1000000
10000000
100000000
1000000000
1970 1975 1980 1985 1990 1995 2000
year size(Mb) cyc time
1980 0.0625 250 ns
1983 0.25 220 ns1986 1 190 ns
1989 4 165 ns
1992 16 145 ns
1996 64 120 ns2000 256 100 ns
8/8/2019 Computer Architecture Fundamentals
12/74
Chapter 1 - Fundamentals 12
TrendsBased on SPEED, the CPU has increased dramatically, but memory
and disk have increased only a little. This has led to dramaticchanged in architecture, Operating Systems, and Programmingpractices.
Capacity Speed (latency)
Logic 2x in 3 years 2x in 3 yearsDRAM 4x in 3 years 2x in 10 years
Disk 4x in 3 years 2x in 10 years
8/8/2019 Computer Architecture Fundamentals
13/74
Chapter 1 - Fundamentals 13
Measuring And
Reporting Performance1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer UsageTrends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of ComputerDesign
1.7 Putting It All Together: The Concept ofMemory Hierarchy
This section talks about:
1. Metrics how do we describein a numerical way theperformance of a computer?
2. What tools do we use to find
those metrics?
8/8/2019 Computer Architecture Fundamentals
14/74
Chapter 1 - Fundamentals 14
Metrics
Time to run the task (ExTime)Execution time, response time, latency
Tasks per day, hour, week, sec, ns (Performance)Throughput, bandwidth
Plane
Boeing 747
BAD/SudConcodre
Speed
610 mph
1350 mph
DC to Paris
6.5 hours
3 hours
Passengers
470
132
Throughput(pmph)
286,700
178,200
8/8/2019 Computer Architecture Fundamentals
15/74
Chapter 1 - Fundamentals 15
Metrics - Comparisons
"X is n times faster than Y" means
ExTime(Y) Performance(X)
--------- = ---------------
ExTime(X) Performance(Y)
Speed of Concorde vs. Boeing 747
Throughput of Boeing 747 vs. Concorde
8/8/2019 Computer Architecture Fundamentals
16/74
Chapter 1 - Fundamentals 16
Metrics - ComparisonsPat has developed a new product, "rabbit" about which she wishes to determine
performance. There is special interest in comparing the new product, rabbit to theold product, turtle, since the product was rewritten for performance reasons. (Pathad used Performance Engineering techniques and thus knew that rabbit was"about twice as fast" as turtle.) The measurements showed:
Performance Comparisons
Product Transactions / second Seconds/ transaction Seconds to process transaction
Turtle 30 0.0333 3
Rabbit 60 0.0166 1
Which of the following statements reflect the performance comparison of rabbit andturtle?
o Rabbit is 100% faster than turtle.
o Rabbit is twice as fast as turtle.
o Rabbit takes 1/2 as long as turtle.
o Rabbit takes 1/3 as long as turtle.
o Rabbit takes 100% less time than turtle.
o Rabbit takes 200% less time than turtle.
o Turtle is 50% as fast as rabbit.
o Turtle is 50% slower than rabbit.
o Turtle takes 200% longer than rabbit.
o Turtle takes 300% longer than rabbit.
8/8/2019 Computer Architecture Fundamentals
17/74
Chapter 1 - Fundamentals 17
Metrics - Throughput
Compiler
ProgrammingLanguage
Application
DatapathControl
Transistors Wires Pins
ISA
Function Units
(millions) of Instructions per second: MIPS(millions) of (FP) operations per second: MFLOP/s
Cycles per second (clock rate)
Megabytes per second
Answers per month
Operations per second
8/8/2019 Computer Architecture Fundamentals
18/74
Chapter 1 - Fundamentals 18
Methods For PredictingPerformance
Benchmarks, Traces, Mixes
Hardware: Cost, delay, area, power estimation
Simulation (many levels)
ISA, RT, Gate, Circuit Queuing Theory
Rules of Thumb
Fundamental Laws/Principles
8/8/2019 Computer Architecture Fundamentals
19/74
Chapter 1 - Fundamentals 19
Benchmarks
First Round 1989
10 programs yielding a single number (SPECmarks)
Second Round 1992
SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)
Compiler Flags unlimited. March 93 of DEC 4000 Model 610:
spice: unix.c:/def=(sysv,has_bcopy,bcopy(a,b,c)=memcpy(b,a,c)
wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200
nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas
Third Round 1995
new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floatingpoint)
benchmarks useful for 3 years
Single flag setting for all programs: SPECint_base95, SPECfp_base95
SPEC: System Performance EvaluationCooperative
8/8/2019 Computer Architecture Fundamentals
20/74
Chapter 1 - Fundamentals 20
BenchmarksCINT2000 (Integer Component of SPEC CPU2000):
Program Language What Is It
164.gzip C Compression
175.vpr C FPGA Circuit Placement and Routing
176.gcc C C Programming Language Compiler
181.mcf C Combinatorial Optimization
186.crafty C Game Playing: Chess197.parser C Word Processing
252.eon C++ Computer Visualization
253.perlbmk C PERL Programming Language
254.gap C Group Theory, Interpreter
255.vortex C Object-oriented Database
256.bzip2 C Compression
300.twolf C Place and Route Simulator
http://www.spec.org/osg/cpu2000/CINT2000/
8/8/2019 Computer Architecture Fundamentals
21/74
Chapter 1 - Fundamentals 21
BenchmarksCFP2000 (Floating Point Component of SPEC
CPU2000):Program Language What Is It
168.wupwise Fortran 77 Physics / Quantum Chromodynamics
171.swim Fortran 77 Shallow Water Modeling172.mgrid Fortran 77 Multi-grid Solver: 3D Potential Field
173.applu Fortran 77 Parabolic / Elliptic Differential Equations
177.mesa C 3-D Graphics Library
178.galgel Fortran 90 Computational Fluid Dynamics
179.art C Image Recognition / Neural Networks183.equake C Seismic Wave Propagation Simulation
187.facerec Fortran 90 Image Processing: Face Recognition
188.ammp C Computational Chemistry
189.lucas Fortran 90 Number Theory / Primality Testing
191.fma3d Fortran 90 Finite-element Crash Simulation200.sixtrack Fortran 77 High Energy Physics Accelerator Design
301.apsi Fortran 77 Meteorology: Pollutant Distribution
http://www.spec.org/osg/cpu2000/CFP2000/
8/8/2019 Computer Architecture Fundamentals
22/74
Chapter 1 - Fundamentals 22
Benchmarks Sample Results ForSpecINT2000
Base Base Base Peak Peak Peak
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
164.gzip 1400 277 505* 1400 270 518*
175.vpr 1400 419 334* 1400 417 336*
176.gcc 1100 275 399* 1100 272 405*
181.mcf 1800 621 290* 1800 619 291*
186.crafty 1000 191 522* 1000 191 523*
197.parser 1800 500 360* 1800 499 361*252.eon 1300 267 486* 1300 267 486*
253.perlbmk 1800 302 596* 1800 302 596*
254.gap 1100 249 442* 1100 248 443*
255.vortex 1900 268 710* 1900 264 719*
256.bzip2 1500 389 386* 1500 375 400*
300.twolf 3000 784 382* 3000 776 387*
SPECint_base2000 438
SPECint2000 442
http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc
Intel OR840(1 GHzPentium III processor)
8/8/2019 Computer Architecture Fundamentals
23/74
Chapter 1 - Fundamentals 23
Benchmarks
Performance Evaluation
For better or worse, benchmarks shape a field
Good products created when have: Good benchmarks
Good ways to summarize performance
Given sales is a function in part of performance relative to
competition, investment in improving product as reported byperformance summary
If benchmarks/summary inadequate, then choose betweenimproving product for real programs vs. improving product to get
more sales;Sales almost always wins!
Execution time is the measure of computer performance!
8/8/2019 Computer Architecture Fundamentals
24/74
Chapter 1 - Fundamentals 24
Benchmarks
Management would like to have one number.
Technical people want more:
1. They want to have evidence of reproducibility there should be enoughinformation so that you or someone else can repeat the experiment.
2. There should be consistency when doing the measurements multipletimes.
How to Summarize Performance
How would you report these results?
401101001Total Time (secs)
201001000Program P2 (secs)
20101Program P1 (secs)
Computer CComputer BComputer A
8/8/2019 Computer Architecture Fundamentals
25/74
Chapter 1 - Fundamentals 25
Quantitative Principles
of Computer Design1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of ComputerDesign
1.7 Putting It All Together: The Concept ofMemory Hierarchy
Make the common case fast.Amdahls Law:
Relates total speedup of asystem to the speedup of some
portion of that system.
8/8/2019 Computer Architecture Fundamentals
26/74
Chapter 1 - Fundamentals 26
Amdahl's Law
Suppose that enhancement E accelerates a fraction Fof the task by a factor S, and the remainder of thetask is unaffected
Quantitative
Design
tEnhancemenWithoutePerformanc
tEnhancemenWithePerformanc
tEnhancemenWithTimeExecution
tEnhancemenWithoutTimeExecutionESpeedup
__
__
___
___)( ==
Speedup due to enhancement E:
This fraction enhanced
8/8/2019 Computer Architecture Fundamentals
27/74
Chapter 1 - Fundamentals 27
ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced
Speedupoverall =ExTimeold
ExTimenew
Speedupenhanced
=
1
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
Quantitative
Design
This fraction enhanced
ExTimeold ExTimenew
Amdahl's Law
8/8/2019 Computer Architecture Fundamentals
28/74
Chapter 1 - Fundamentals 28
Amdahl's Law
Floating point instructions improved to run 2X; but only10% of actual instructions are FP
Speedupoverall =1
0.95= 1.053
ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold
Quantitative
Design
8/8/2019 Computer Architecture Fundamentals
29/74
Chapter 1 - Fundamentals 29
QuantitativeDesign
Instruction Frequency
Invest Resources where time is Spent!
CPI = (CPU Time * Clock Rate) / Instruction Count
= Cycles / Instruction Count
=
=n
i
ii ICPITimeCycleTimeCPU1
**__
==
n
i
ii FCPICPI1
* where CountnInstructioI
ii
F _=
Number ofinstructions oftype I.
Cycles Per
Instruction
8/8/2019 Computer Architecture Fundamentals
30/74
Chapter 1 - Fundamentals 30
Quantitative
Design
Base Machine (Reg / Reg)
Op Freq Cycles CPI(i) (% Time)
ALU 50% 1 .5 (33%)
Load 20% 2 .4 (27%)
Store 10% 2 .2 (13%)
Branch 20% 2 .4 (27%)
Total CPI 1.5
Suppose we have a machine where we can count the frequency with whichinstructions are executed. We also know how many cycles it takes foreach instruction type.
Cycles Per
Instruction
How do we get CPI(I)?How do we get %time?
8/8/2019 Computer Architecture Fundamentals
31/74
Chapter 1 - Fundamentals 31
Quantitative
Design
Locality of
Reference
Programs access a relatively small portion of the address space at
any instant of time.
There are two different types of locality:
Temporal Locality (locality in time): If an item is referenced, it will
tend to be referenced again soon (loops, reuse, etc.)
Spatial Locality (locality in space/location): If an item is referenced,items whose addresses are close by tend to be referenced soon(straight line code, array access, etc.)
8/8/2019 Computer Architecture Fundamentals
32/74
Chapter 1 - Fundamentals 32
The Concept of
Memory Hierarchy1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of ComputerDesign
1.7 Putting It All Together: The Concept ofMemory Hierarchy
Fast memory is expensive.
Slow memory is cheap.
The goal is to minimize theprice/performance for aparticular price point.
8/8/2019 Computer Architecture Fundamentals
33/74
Chapter 1 - Fundamentals 33
Memory Hierarchy
RegistersLevel 1cache
Level 2Cache
Memory Disk
OS/UserOSHardwareHardwareCompilerManagedBy
100500 - 1000500 - 10002000 - 500010,000
50,000Bandwidth(in MB/sec)
5,000,000nsec
150 nsec15 nsec3 nsec1 nsecAccessTime
> 5Gigabytes
8/8/2019 Computer Architecture Fundamentals
34/74
Chapter 1 - Fundamentals 34
Memory Hierarchy
Hit: data appears in some block in the upper level (example:Block X)
Hit Rate: the fraction of memory access found in the upper level Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
Miss: data needs to be retrieve from a block in the lower level
(Block Y) Miss Rate = 1 - (Hit Rate)
Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor
Hit Time
8/8/2019 Computer Architecture Fundamentals
35/74
Chapter 1 - Fundamentals 35
Memory Hierarchy
RegistersLevel 1cache
Level 2Cache
Memory Disk
What is the cost of executing a program if: Stores are free (theres a write pipe) Loads are 20% of all instructions 80% of loads hit (are found) in the Level 1 cache 97 of loads hit in the Level 2 cache.
8/8/2019 Computer Architecture Fundamentals
36/74
Chapter 1 - Fundamentals 36
Wrap Up
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy
8/8/2019 Computer Architecture Fundamentals
37/74
Chapter 1 - Fundamentals 37
Computer Architecture
Chapter 2Instruction Sets
8/8/2019 Computer Architecture Fundamentals
38/74
Chapter 1 - Fundamentals 38
Introduction2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The MIPS Architecture
Bonus
8/8/2019 Computer Architecture Fundamentals
39/74
Chapter 1 - Fundamentals 39
IntroductionThe Instruction Set Architecture is that portion of the machine visible
to the assembly level programmer or to the compiler writer.
1. What are the advantages and disadvantages of various
instruction set alternatives.2. How do languages and compilers affect ISA.
3. Use the DLX architecture as an example of a RISC architecture.
instruction set
software
hardware
Cl if i I i S
8/8/2019 Computer Architecture Fundamentals
40/74
Chapter 1 - Fundamentals 40
Classifying Instruction Set
Architectures
Classifications can be by:
1. Stack/accumulator/register
2. Number of memory operands.
3. Number of total operands.
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
Instruction Set B i ISA
8/8/2019 Computer Architecture Fundamentals
41/74
Chapter 1 - Fundamentals 41
Instruction Set
ArchitecturesAccumulator:
1 address add A acc acc + mem[A]1+x address addx A acc acc + mem[A + x]
Stack:
0 address add tos tos + next
General Purpose Register:
2 address add A B EA(A) EA(A) + EA(B)3 address add A B C EA(A) EA(B) + EA(C)
Load/Store:
0 Memory load R1, Mem1
load R2, Mem2add R1, R2
1 Memory add R1, Mem2
Basic ISA
Classes
ALU Instructionscan have two orthree operands.
ALU Instructions can
have 0, 1, 2, 3 operands.Shown here are cases of0 and 1.
Instruction Set B i ISA
8/8/2019 Computer Architecture Fundamentals
42/74
Chapter 1 - Fundamentals 42
Instruction Set
Architectures
Basic ISA
Classes
Store C, R3Pop C
Add R3, R1, R2Store C, R1Store CAdd
Load R2, BAdd R1, BAdd BPush B
Load R1, ALoad R1, ALoad APush A
Register
(load-store)
Register
(Register-memory)
AccumulatorStack
The results of different address classes is easiest to see with the examples here,all of which implement the sequences for C = A + B.
Registers are the class that won out. The more registers on the CPU, the better.
Instruction Set
8/8/2019 Computer Architecture Fundamentals
43/74
Chapter 1 - Fundamentals 43
Instruction Set
Architectures
Intel 80x86
Integer Registers
Condition CodesEflags
Instruction CounterEIPPC
Data Seg. 3GS
Data Seg. 2FS
Extra Data Segment PointerES
Data Segment PointerDS
Stack Segment PointerSS
Code Segment PointerCS
Index RegisterEDIGPR7
Index RegisterESIGPR6
Base Pointer for base of stack seg.EBPGPR5
Stack PointerESPGPR4
Base Address RegisterEBXGPR3
Data Register; multiply, divideEDXGPR2
Count register, string, loopECXGPR1
AccumulatorEAXGPR0
8/8/2019 Computer Architecture Fundamentals
44/74
Chapter 1 - Fundamentals 44
Memory Addressing
Sections Include:
Interpreting Memory Addresses
Addressing Modes
Displacement Address Mode
Immediate Address Mode
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
Memory I t ti M
8/8/2019 Computer Architecture Fundamentals
45/74
Chapter 1 - Fundamentals 45
Memory
Addressing
What object is accessed as a function of the address and length?
Objects have byte addresses an address refers to the number of bytescounted from the beginning of memory.
Little Endian puts the byte whose address is xx00 at the leastsignificant position in the word.
Big Endian puts the byte whose address is xx00 at the most significantposition in the word.
Alignment data must be aligned on a boundary equal to its size.
Misalignment typically results in an alignment fault that must behandled by the Operating System.
Interpreting Memory
Addresses
Memory Add i g
8/8/2019 Computer Architecture Fundamentals
46/74
Chapter 1 - Fundamentals 46
Memory
Addressing
Addressing
ModesThis table shows the most common modes. A more complete set is in
Figure 2.6
Used for static data.R[R4]
8/8/2019 Computer Architecture Fundamentals
47/74
Chapter 1 - Fundamentals 47
Addressing Modes
R[1]R[1]+M[100+R[2]+R[3]*d]add r1, 100(r2)[r3]Scaled
R[2]R[2] dR[1]R[1]+M[R[2]]
add r1, (r2)Autodecrement
R[1]R[1]+M[R[2]]R[2]R[2]+d
add r1, (r2)+Autoincrement
R[1]R[1]+M[M[R[3]]]add r1, @(r3)Memory indirectR[1]
R[1]+M[1001]add r1, (1001)Direct/Absolute
R[3]R[3]+M[R[1]+R[2]]add r3, (r1+r2)Indexed
R[4]R[4]+M[R[1]]add r4, (r1)Register indirect
R[4]R[4]+M[100+R[1]]add r4, 100(r1)Displacement
R[4]R[4]+3add r4, #3Immediate
R[4]R[4]+R[3]add r4, r3Register
MeaningExampleMode
( ) memory access [ ] accessing a Register or Memory location
Memory Displacement
8/8/2019 Computer Architecture Fundamentals
48/74
Chapter 1 - Fundamentals 48
Memory
Addressing
Displacement
Addressing ModeHow big should the displacement be?
For addresses that do fit in displacement size:Add R4, 10000 (R0)
For addresses that dont fit in displacement size, the compilermust do the following:
Load R1, addressAdd R4, 0 (R1)
Depends on typical displaces as to how big this should be.
On both IA32 and DLX, the space allocated is 16 bits.
Memory Immediate Address
8/8/2019 Computer Architecture Fundamentals
49/74
Chapter 1 - Fundamentals 49
Memory
Addressing
Immediate Address
Mode
Used where we want to get to a numerical value in aninstruction.
So how would you get a 32 bit value into a register?
At highlevel:
a = b + 3;
if ( a > 17 )
goto Addr
At Assembler level:
Load R2, 3Add R0, R1, R2
Load R2, 17CMPBGT R1, R2
Load R1, Address
Jump (R1)
Operations In The
8/8/2019 Computer Architecture Fundamentals
50/74
Chapter 1 - Fundamentals 50
Operations In The
Instruction Set
Sections Include:
Detailed information about typesof instructions.
Instructions for Control Flow(conditional branches, jumps)
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
Operations In The
8/8/2019 Computer Architecture Fundamentals
51/74
Chapter 1 - Fundamentals 51
Operations In The
Instruction SetArithmetic and logical - and, add
Data transfer - move, loadControl - branch, jump, call
System - system call, traps
Floating point - add, mul, div, sqrt
Decimal - add, convertString - move, compare
Multimedia - 2D, 3D? e.g., Intel MMX and Sun VIS
Operator Types
Operations In The Control
8/8/2019 Computer Architecture Fundamentals
52/74
Chapter 1 - Fundamentals 52
Operations In The
Instruction Set
Control Instructions Issues:
taken or not
where is the target
link return address
save or restore
Instructions that change the PC:
(conditional) branches, (unconditional) jumps
function calls, function returns
system calls, system returns
Control
InstructionsConditional branches are 20%
of all instructions!!
Operations In The Control
8/8/2019 Computer Architecture Fundamentals
53/74
Chapter 1 - Fundamentals 53
Operations In The
Instruction SetThere are numerous tradeoffs:
Compare and branch
+ no extra compare, no state passedbetween instructions
-- requires ALU op, restricts codescheduling opportunities
Implicitly set condition codes - Z, N, V, C
+ can be set ``for free''
-- constrains code reordering, extrastate to save/restore
Explicitly set condition codes+ can be set ``for free'', decouples
branch/fetch from pipeline
-- extra state to save/restore
Control
Instructions
There are numerous tradeoffs:
condition in general-purpose register
+ no special state but uses up a register
-- branch condition separate from branchlogic in pipeline
some data for MIPS
> 80% branches use immediate data, >80% of those zero
50% branches use == 0 or 0
compromise in MIPS
branch==0, branch0
compare instructions for all othercompares
Operations In The Control
8/8/2019 Computer Architecture Fundamentals
54/74
Chapter 1 - Fundamentals 54
Operations In The
Instruction SetLink Return Address:
implicit register - many recentarchitectures use this
+ fast, simple
-- s/w save register before next call,surprise traps?
explicit register+ may avoid saving register
-- register must be specified
processor stack
+ recursion direct
-- complex instructions
Control
Instructions
Save or restore state:
What state?function calls: registers
system calls: registers, flags, PC, PSW, etc
Hardware need not save registersCaller can save registers in use
Callee save registers it will useHardware register save
IBM STM, VAX CALLS
Faster?
Many recent architectures do no registersaving
Or do implicit register saving with registerwindows (SPARC)
Type And Size of Operands
8/8/2019 Computer Architecture Fundamentals
55/74
Chapter 1 - Fundamentals 55
Type And Size of Operands
The type of the operand is usuallyencoded in the Opcode a LDWimplies loading of a word.
Common sizes are:
Character (1 byte)
Half word (16 bits)
Word (32 bits)
Single Precision Floating Point (1 Word)
Double Precision Floating Point (2 Words)Integers are twos complement binary.
Floating point is IEEE 754.
Some languages (like COBOL) usepacked decimal.
2.1 Introduction
2.2 Classifying Instruction SetArchitectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
Encoding And Instruction Set
8/8/2019 Computer Architecture Fundamentals
56/74
Chapter 1 - Fundamentals 56
Encoding And Instruction Set
This section has to do with how anassembly level instruction isencoded into binary.
Ultimately, its the binary that isread and interpreted by themachine.
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
We will be using the Intel instruction set which is defined at:http://developer.intel.com/design/Pentium4/manuals.
Volume 2 has the instruction set.
Encoding And 80x86 Instruction
8/8/2019 Computer Architecture Fundamentals
57/74
Chapter 1 - Fundamentals 57
Encoding And
Instruction Set
80x86 Instruction
Encodingfor ( index = 0; index < iterations; index++ )
0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0
0040D3B6 EB 09 jmp main+0D1h (0040d3c1)
0040D3B8 8B 4D F0 mov ecx,dword ptr [ebp-10h]
0040D3BB 83 C1 01 add ecx,1
0040D3BE 89 4D F0 mov dword ptr [ebp-10h],ecx
0040D3C1 8B 55 F0 mov edx,dword ptr [ebp-10h]
0040D3C4 3B 55 F8 cmp edx,dword ptr [ebp-8]
0040D3C7 7D 15 jge main+0EEh (0040d3de)long_temp = (*alignment + long_temp) % 47;
0040D3C9 8B 45 F4 mov eax,dword ptr [ebp-0Ch]
0040D3CC 8B 00 mov eax,dword ptr [eax]
0040D3CE 03 45 EC add eax,dword ptr [ebp-14h]
0040D3D1 99 cdq
0040D3D2 B9 2F 00 00 00 mov ecx,2Fh
0040D3D7 F7 F9 idiv eax,ecx
0040D3D9 89 55 EC mov dword ptr [ebp-14h],edx
0040D3DC EB DA jmp main+0C8h (0040d3b8)
Heres somesample code thats
been disassembled.
It was compiledwith the debugger
option so is notoptimized.
This codewas
producedusing Visual
Studio
Encoding And 80x86 Instruction
8/8/2019 Computer Architecture Fundamentals
58/74
Chapter 1 - Fundamentals 58
Encoding And
Instruction Set
80x86 Instruction
Encodingfor ( index = 0; index < iterations; index++ )
00401000 8B 0D 40 54 40 00 mov ecx,dword ptr ds:[405440h]
00401006 33 D2 xor edx,edx
00401008 85 C9 test ecx,ecx
0040100A 7E 14 jle 00401020
0040100C 56 push esi
0040100D 57 push edi
0040100E 8B F1 mov esi,ecx
long_temp = (*alignment + long_temp) % 47;
00401010 8D 04 11 lea eax,[ecx+edx]
00401013 BF 2F 00 00 00 mov edi,2Fh
00401018 99 cdq
00401019 F7 FF idiv eax,edi
0040101B 4E dec esi0040101C 75 F2 jne 00401010
0040101E 5F pop edi
0040101F 5E pop esi
00401020 C3 ret
Heres somesample code thats
been disassembled.
It was compiledwith optimization
This codewas
producedusing Visual
Studio
Encoding And 80x86 Instruction
8/8/2019 Computer Architecture Fundamentals
59/74
Chapter 1 - Fundamentals59
Encoding And
Instruction Set
80x86 Instruction
Encodingfor ( index = 0; index < iterations; index++ )
0x804852f : add $0x10,%esp
0x8048532 : lea 0xfffffff8(%ebp),%edx0x8048535 : test %esi,%esi
0x8048537 : jle 0x8048543
0x8048539 : mov %esi,%eax
0x804853b : nop
0x804853c : lea 0x0(%esi,1),%esi
long_temp = (*alignment + long_temp) % 47;
0x8048540 : dec %eax
0x8048541 : jne 0x8048540
0x8048543 : add $0xfffffff4,%esp
Heres somesample code thats
been disassembled.
It was compiledwith optimization
This codewas
producedusing gccand gdb.
For details,see Lab 2.1
Note that the representation ofthe code is dependent on thecompiler/debugger!
Encoding And 80x86 Instruction
8/8/2019 Computer Architecture Fundamentals
60/74
Chapter 1 - Fundamentals 60
Encoding And
Instruction Set Encoding
RegADD Disp.
34 8
postbyteSHL6 8
V/w2
Disp.8
TEST
7
W
1
postbyte
8
Immediate
8
W
1 A Morass of disjoint encoding!!
This is Figure D.8
Encoding And 80x86 Instruction
8/8/2019 Computer Architecture Fundamentals
61/74
Chapter 1 - Fundamentals 61
Encoding And
Instruction Set Encoding
CALLF Offset Segment Number
CondJE Disp.
44
8 16 16
8
postbyteMOV6 8
D/w2
Disp.8
PUSH
5
Reg
3
Encoding And 80x86 Instruction
8/8/2019 Computer Architecture Fundamentals
62/74
Chapter 1 - Fundamentals 62
g
Instruction Set Encoding
C7 /0 MOV r/m32,imm32 Move an immediate 32 bit data item to a register or to memory.
Copies the second operand (source operand) to the first operand (destination operand).The source operand can be an immediate value, general purpose register, segment
register, or memory location. Both operands must be the same size, which can be abyte, a word, or a doubleword.
In our case, because of the C7 Opcode, we know its a sub-flavor of MOV putting animmediate value into memory.
Heres the instruction that we had several pages ago:
0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0
Is described in:
http://developer.intel.com/design/pentium4/manuals/245471.htm
(I found it on page 479, but this is obviously version dependent.)
C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0Op Code for
Mov Immediate32 bits of 0.
Target Register+ use next 8 bits as
displacement.
This is-10 hex.
The Role of Compilers
8/8/2019 Computer Architecture Fundamentals
63/74
Chapter 1 - Fundamentals 63
p
Compiler goals:
All correct programs executecorrectly
Most compiled programsexecute fast (optimizations)
Fast compilation Debugging support
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
The Role ofSteps In Compilation
8/8/2019 Computer Architecture Fundamentals
64/74
Chapter 1 - Fundamentals 64
CompilersSteps In Compilation
Parsing --> intermediate representation
Jump Optimization
Loop Optimizations
Register Allocation
Code Generation --> assembly code
Common Sub-Expression
Procedure in-liningConstant Propagation
Strength Reduction
Pipeline Scheduling
The Role of
8/8/2019 Computer Architecture Fundamentals
65/74
Chapter 1 - Fundamentals 65
Compilers Steps In Compilation
Not MeasuredDepends on Machine KnowledgeMachine Dependent
42%Across A BranchGlobal
40%Within Straight Line CodeLocal
Not MeasuredAt or near the source level;machine-independent
High Level
% of the total number of
optimizingtransformations
ExplanationOptimization
Name
The Role ofWhat compiler writers want:
8/8/2019 Computer Architecture Fundamentals
66/74
Chapter 1 - Fundamentals 66
CompilersWhat compiler writers want:
regularity
orthogonality
composability
Compilers perform a giant case
analysis too many choices make it hard
Orthogonal instruction sets
operation, addressing mode, data
type
One solution or all possible solutions
2 branch conditions - eq, lt
or all six - eq, ne, lt, gt, le, ge
not 3 or 4
There are advantages to having
instructions that are primitives.
Let the compiler put the instructions
together to make more complex
sequences.
The MIPS Architecture
8/8/2019 Computer Architecture Fundamentals
67/74
Chapter 1 - Fundamentals 67
MIPS is very RISC oriented.
MIPS will be used for manyexamples throughout thecourse.
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The MIPS Architecture
The MIPS MIPS Characteristics
8/8/2019 Computer Architecture Fundamentals
68/74
Chapter 1 - Fundamentals 68
ArchitectureTheres MIPS 32 that we learned inCS140
32-bit byte addresses aligned
Load/store - only displacementaddressing
Standard datatypes
3 fixed length formats
32 32-bit GPRs (r0 = 0)
16 64-bit (32 32-bit) FPRs
FP status register
No Condition Codes
Data transfer
load/store word, load/store
byte/halfword signed?
load/store FP single/double
moves between GPRs and FPRs
ALU
add/subtract signed? immediate?
multiply/divide signed? and,or,xor immediate?, shifts: ll, rl,
ra immediate?
sets immediate?
Theres MIPS 64 the current arch.
Standard datatypes4 fixed length formats (8,16,32,64)
32 64-bit GPRs (r0 = 0)
64 64-bit FPRs
Addressing Modes Immediate
Displacement
(Register Mode used only for ALU)
The MIPS MIPS Characteristics
8/8/2019 Computer Architecture Fundamentals
69/74
Chapter 1 - Fundamentals 69
Architecture
Control
branches == 0, 0 conditional branch testing FP bit
jump, jump register
jump & link, jump & link register
trap, return-from-exception
Floating Point
add/sub/mul/div
single/double
fp converts, fp set
The MIPS The MIPS Encoding
8/8/2019 Computer Architecture Fundamentals
70/74
Chapter 1 - Fundamentals 70
Architecture
Op
31 26 01516202125
Rs1 Rd immediate
Op
31 26 025
Op
31 26 01516202125
Rs1 Rs2
target
Rd Opx
Register-Register
561011
Register-Immediate
Op
31 26 01516202125
Rs1 Rs2/Opx immediate
Branch
Jump / Call
RISC versus CISC
8/8/2019 Computer Architecture Fundamentals
71/74
Chapter 1 - Fundamentals 71
BONUS
combines 3 features
architecture implementation
compilers and OS
argues that
implementation effects are second order
compilers are similar
RISCs are better than CISCs: fair comparison?
RISC versus CISC
8/8/2019 Computer Architecture Fundamentals
72/74
Chapter 1 - Fundamentals 72
BONUS
RISC factor: {CPI VAX * Instr VAX }/ {CPI MIPS * Instr MIPS }
Benchmark Instruction CPI CPI CPI RISC
Ratio MIPS VAX Ratio factor
li 1.6 1.1 6.5 6.0 3.7
eqntott 1.1 1.3 4.4 3.5 3.3
fpppp 2.9 1.5 15.2 10.5 2.7
tomcatv 2.9 2.1 17.5 8.2 2.9
RISC versus CISC
8/8/2019 Computer Architecture Fundamentals
73/74
Chapter 1 - Fundamentals 73
BONUSCompensating factors
Increase VAX CPI but decrease
VAX instruction count Increase MIPS instruction count
e.g. 1: loads/stores versusoperand specifiers
e.g. 2: necessary complex
instructions: loop branches
Factors favoring VAX
Big immediate values
Not-taken branches incur no
delay
Factors favoring MIPS
Operand specifier decoding
Number of registers Separate floating point unit
Simple branches/jumps (lowerlatency)
No complex instructions
Instruction scheduling
Translation buffer
Branch displacement size
Wrapup
8/8/2019 Computer Architecture Fundamentals
74/74
Chapter 1 - Fundamentals 74
p p
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set2.7 The Role of Compilers
2.8 The DLX Architecture
Bonus