Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
CSE 141 – Computer ArchitectureSummer Session I, 2005
Lecture 3Performance and
Single Cycle CPU Part 1Pramod V. Argade
2Pramod Argade UCSD CSE 141, Spring 2005
CSE141: Introduction to Computer Architecture
Instructor: Pramod V. Argade ([email protected])Office Hour:
Tue. 7:30 - 9:00 (Center 105)Wed. 5:00 - 6:00 PM (HSS 1330)By Appointment
TAs:Chengmo Yang: [email protected] Rao: [email protected]
Lecture: Mon/Wed. 6:00 - 8:50 PM, HSS 1330
Textbook: Computer Organization & DesignThe Hardware Software Interface, 3nd Edition.Authors: Patterson and Hennessy
Web-page: http://www.cse.ucsd.edu/classes/su05/cse141
3Pramod Argade UCSD CSE 141, Spring 2005
Summer Session I, 2005CSE141 Course Schedule
Lecture # Date Time Room Topic Quiz topic HomeworkDue
1 Mon. 6/27 6 - 8:50 PM HSS 1330 Introduction, Ch. 1ISA, Ch. 2 - -
2 Wed. 6/29 6 - 8:50 PM HSS 1330 Arithmetic, Ch. 3 ISACh. 2 #1
- Mon. 7/4 July 4th Holiday - -
3 Wed. 7/6 6 - 8:50 PM HSS 1330 Performance, Ch. 4Single-cycle CPU Ch. 5
ArithmeticCh. 3 #2
4 Mon. 7/11 6 - 8:50 PM HSS 1330 Single-cycle CPU Ch. 5 Cont.Multi-cycle CPU Ch. 5
PerformanceCh. 4 #3
5 Tue. 7/12 7:30 - 8:50 PM HSS 1330 Multi-cycle CPU Ch. 5 Cont.(July 4th make up class) - -
6 Wed. 7/13 6 - 8:50 PM HSS 1330 Single and Multicycle CPU Examples andReview for Midterm
Single-cycle CPUCh. 5
-
7 Mon. 7/18 6 - 8:50 PM HSS 1330 Mid-term ExamExceptions - #4
8 Tue. 7/19 7:30 - 8:50 PM HSS 1330 Pipelining Ch. 6(July 4th make up class) - -
9 Wed. 7/20 6 - 8:50 PM HSS 1330 Hazards, Ch. 6 - -
10 Mon. 7/25 6 - 8:50 PM HSS 1330 Memory Hierarchy & Caches Ch. 7 HazardsCh. 6 #5
11 Wed. 7/27 6 - 8:50 PM HSS 1330 Virtual Memory, Ch. 7Course Review
CacheCh. 7 #6
12 Sat. 7/30 TBD TBD Final Exam - -
No Class
4Pramod Argade UCSD CSE 141, Spring 2005
AnnouncementsDiscussions Sections for 141: Thursday, 7:30 - 8:30 WLH-2205
Office Hour:– Tue. 7:30 - 9:00 (Center 105)– Wed. 5:00 - 6:00 PM (HSS 1330)– By Appointment
Reading Assignment– Chapter 4. Assessing and Understanding Performance
Sections 4.1 - 4.6Homework 3: Due Mon., July 11th in class4.1, 4.2, 4.6, 4.7, 4.8, 4.9, 4.12, 4.19, 4.22, 4.45
QuizWhen: Mon., July 11th, First 10 minutes of the classTopic: Performance, Chapter 4 Need: Paper, pen
5Pramod Argade UCSD CSE 141, Spring 2005
Performance Assessment
Example definitions of performance– Passengers * MPH– Cruising speed– Passenger capacity– (Passenger * MPH)/$ need more data!
Clearly formulate your definition of performance
Airplane PassengerCapacity
Cruising Range(miles)
Cruising Speed(m.p.h.)
Passenger Throughput(passenger x
m.p.h.)Boeing 777 375 4630 610 228,750Boeing 747 470 4150 610 286,700
Douglas DC-8-50 146 8720 544 79,424Airbus A380 555 8000 645 357,975
6Pramod Argade UCSD CSE 141, Spring 2005
Why worry about performance?
Learn to measure, report, and summarizeMake intelligent choicesSee through the marketing hype As a designer or purchaser, assess:– which system has best performance?– which system has lowest cost?– which system has highest performance/cost?
Formulate metrics for performance measurement– How to report relative performance?
Understand impact of architectural choices on performance
7Pramod Argade UCSD CSE 141, Spring 2005
Response Time (latency)— How long does it take for my job to run?— How long does it take to execute a job?— How long must I wait for the database query?
Throughput— How many jobs can the machine run at once?— What is the average execution rate?— How much work is getting done?
If we upgrade a machine with a new processor what do we decrease?
If we add a new machine to the lab what do we increase?
Computer Performance: TIME, TIME, TIME
8Pramod Argade UCSD CSE 141, Spring 2005
Elapsed Time (2:39)– Counts everything (disk and memory accesses, I/O , etc.)– A useful number, but often not good for comparison purposes
Total CPU time (103.6 s)– Doesn't count I/O or time spent running other programs– Comprised of system time (12.9 s), and user time (90.7 s)
Our focus: user CPU time – Time spent executing the lines of code that are "in" our program
Execution Time% time program... program results ...90.7u 12.9s 2:39 65%%
9Pramod Argade UCSD CSE 141, Spring 2005
For some program running on machine X,PerformanceX = 1 / Execution timeX
“Machine X is n times faster than Y”n = PerformanceX / PerformanceY
= (Execution timeY) / (Execution timeX )Problem:Execution time: Machine A: 12 seconds, B: 20 seconds– A/B = .6 , so A is 40% faster, or 1.4X faster, or B is 40% slower– B/A = 1.67, so A is 67% faster, or 1.67X faster, or B is 67% slower
Need a precise definition“X is n times faster than Y”, n > 1
– A is 1.67 times faster than B
A Definition of Performance
10Pramod Argade UCSD CSE 141, Spring 2005
Clock Cycles
Operation of conventional computer is controlled by Clock “ticks”
Clock rate (frequency) = cycles per second (Hertz)MHz = Million cycles per secondGHz = Giga (Billion) cycles per second
Cycle time = time between ticks = seconds per cycle– A 2 GHz clock => Cycle time = 1/(2x109) s = 0.5 nanoseconds
Instead of reporting execution time in seconds, we often use clock cycles
time
secondsprogram
=cycles
program×
secondscycle
11Pramod Argade UCSD CSE 141, Spring 2005
Assumption: # of cycles = # of instructions
Assumption is incorrect– Typically, different instructions take different number of clock cycles
Integer Multiply instructions are multi-cycleFloating point instructions are multi-cycle
time
1st i
nstru
ctio
n
2nd
inst
ruct
ion
3rd
inst
ruct
ion
4th
5th
6th ...
How many cycles are required for a program?
1st 2nd 5th3rd 4th
time
12Pramod Argade UCSD CSE 141, Spring 2005
A given program will require– some number of instructions (machine instructions)
– some number of cycles
– some number of seconds
We have a vocabulary that relates these quantities:– cycle time (seconds per cycle)
– clock rate (cycles per second)
– CPI (cycles per instruction)
Now that we understand cycles
13Pramod Argade UCSD CSE 141, Spring 2005
A given program will require– some number of instructions (machine instructions)
– some number of cycles
– some number of seconds
We have a vocabulary that relates these quantities:– cycle time (seconds per cycle)
– clock rate (cycles per second)
– CPI (cycles per instruction)
Now that we understand cycles
CPU time = Seconds = Instructions x Cycles x SecondsProgram Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x SecondsProgram Program Instruction Cycle
14Pramod Argade UCSD CSE 141, Spring 2005
Suppose we have two implementations of the same instruction set architecture (ISA).For some program,– Machine A has a clock cycle time of 10 ns. and a CPI of 2.0– Machine B has a clock cycle time of 20 ns. and a CPI of 1.2
What machine is faster for this program, and by how much?
CPI Example 2.1
15Pramod Argade UCSD CSE 141, Spring 2005
A compiler designer is trying to decide between two code sequences for a particular machine with 3 instruction classes. – CPI: Class A = 1, Class B = 2, and Class C = 3– Code sequence X has 5 instructions:
2 of A, 1 of B, and 2 of C– Code sequence Y has 6 instructions:
4 of A, 1 of B, and 1 of C.
Which sequence will be faster? How much?
Compiler Optimization: Example 2.2
16Pramod Argade UCSD CSE 141, Spring 2005
Execution time: Contributing Factors
CPU time = Seconds = Instructions x Cycles x SecondsProgram Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x SecondsProgram Program Instruction Cycle
Improve performance => reduce execution time– Reduce instruction count (Programmer, Compiler) – Reduce cycles per instruction (ISA, Machine designer)– Reduce clock cycle time (Hardware designer, Process engineer)
InstructionCount CPI Clock
RateProgram XCompiler X (X)ISA X XOrganization X XTechnology X
17Pramod Argade UCSD CSE 141, Spring 2005
Performance
Performance is determined by program execution timeDo any of the other variables equal performance?– # of cycles to execute program?– # of instructions in program?– # of cycles per second?– average # of cycles per instruction?– average # of instructions per second?
Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.
18Pramod Argade UCSD CSE 141, Spring 2005
Performance Variation
Number ofinstructions
CPI Clock CycleTime
Same machine differentprograms
different similar same
same programs, differentmachines, same ISA
same different different
Same programs,different machines
somewhatdifferent
different different
CPU Execution Time
Instruction Count
CPI Clock Cycle Time= X X
19Pramod Argade UCSD CSE 141, Spring 2005
Amdahl’s Law
The impact of a performance improvement is limited by the percent of execution time affected by the improvement
Make the common case fast!Amdahl’s law sets limit on how much improvement can be made
Execution timeafter improvement
=Execution Time Affected
Amount of ImprovementExecution Time Unaffected+
20Pramod Argade UCSD CSE 141, Spring 2005
Amdahl’s Law: Example 2.3Example: A program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?
Is it possible to get the program to run 5 times faster?
21Pramod Argade UCSD CSE 141, Spring 2005
Metrics of Performance
Compiler
Programming Language
Application
DatapathControl
TransistorsWiresPins
ISA
Function Units
(millions) of Instructions per second – MIPS(millions) of (F.P.) operations per second –MFLOP/s
Cycles per second (clock rate)
Megabytes per second
Answers per month
Useful Operations per second
Each metric has a place and a purpose, and each can be misused
22Pramod Argade UCSD CSE 141, Spring 2005
Performance best determined by running a real application– Use programs typical of expected workload– Or, typical of expected class of applications
e.g., compilers/editors, scientific applications, graphics, etc.
Small benchmarks– nice for architects and designers– easy to standardize– can be abused
SPEC (System Performance Evaluation Cooperative)– companies have agreed on a set of real program and inputs– can still be abused (Intel’s “other” bug)– valuable indicator of performance (and compiler technology)
Benchmarks
23Pramod Argade UCSD CSE 141, Spring 2005
SPEC ‘89
Compiler “enhancements” and performance
0
100
200
300
400
500
600
700
800
tomcatvfppppmatrix300eqntottlinasa7doducspiceespressogcc
BenchmarkCompiler
Enhanced compiler
SP
EC p
erfo
rman
ce ra
tio
24Pramod Argade UCSD CSE 141, Spring 2005
SPEC ‘95*Benchmark Descriptiongo Artificial intelligence; plays the game of Gom88ksim Motorola 88k chip simulator; runs test programgcc The Gnu C compiler generating SPARC codecompress Compresses and decompresses file in memoryli Lisp interpreterijpeg Graphic compression and decompressionperl Manipulates strings and prime numbers in the special-purpose programming language Perlvortex A database programtomcatv A mesh generation programswim Shallow water model with 513 x 513 gridsu2cor quantum physics; Monte Carlo simulationhydro2d Astrophysics; Hydrodynamic Naiver Stokes equationsmgrid Multigrid solver in 3-D potential fieldapplu Parabolic/elliptic partial differential equationstrub3d Simulates isotropic, homogeneous turbulence in a cubeapsi Solves problems regarding temperature, wind velocity, and distribution of pollutantfpppp Quantum chemistrywave5 Plasma physics; electromagnetic particle simulation
SPEC CPU2000 consists of 12 integer and 14 floating point benchmarks
25Pramod Argade UCSD CSE 141, Spring 2005
SPEC ‘95
Does doubling the clock rate double the performance?Can a machine with a slower clock rate have better performance?
Clock rate (MHz)
SPE
Cin
t
2
0
4
6
8
3
1
5
7
9
10
200 25015010050
Pentium
Pentium Pro
PentiumClock rate (MHz)
SP
EC
fp
Pentium Pro
2
0
4
6
8
3
1
5
7
9
10
200 25015010050
26Pramod Argade UCSD CSE 141, Spring 2005
MIPS, MFLOPS etc.
• Hard to relate MIPS/MFLOPS with execution time
• Highly program-dependent metrics• Deceptive (See pages 78 - 79 in the text)
• MIPS would be higher for a program using simple instructions (e.g. a loop of NOPs!)
• MIPS - million instructions per second
number of instructions executed in program Clock rateexecution time in seconds * 106 CPI * 106
• MFLOPS - million floating point operations per secondnumber of floating point operations executed in program
execution time in seconds * 106
= =
=
CPU Execution Time
Instruction Count
CPI Clock Cycle Time= X X
27Pramod Argade UCSD CSE 141, Spring 2005
RISC Processor: Example 2.4Base Machine (Reg / Reg)Op Freq Cycles CPI(i) % TimeALU 50% 1Load 20% 5Store 10% 3Branch 20% 2
• What is average CPI?• What percentage of time is spent in each instruction class?• How much faster would the machine be if a better data cache reduced the average load time to 2 cycles?
• How does this compare with using branch prediction to shave a cycle off the branch time?
• What if two ALU instructions could be executed at once?
28Pramod Argade UCSD CSE 141, Spring 2005
Performance is specific to particular program(s)– Total execution time is a consistent summary of performance
For a given architecture performance increases come from:– increases in clock rate (without adverse CPI affects)– improvements in processor organization that lower CPI– compiler enhancements that lower CPI and/or instruction count
Pitfall: expecting improvement in one aspect of a machine’s
performance to affect the total performance
You should not always believe everything you read! Read
carefully!
Performance assessment is tricky
29Pramod Argade UCSD CSE 141, Spring 2005
AnnouncementsDiscussions Sections for 141: Thursday, 7:30 - 8:30 WLH-2205
Office Hour:– Tue. 7:30 - 9:00 (Center 105)– Wed. 5:00 - 6:00 PM (HSS 1330)– By Appointment
Reading Assignment– Chapter 4. Assessing and Understanding Performance
Sections 4.1 - 4.6Homework 3: Due Mon., July 11th in class4.1, 4.2, 4.6, 4.7, 4.8, 4.9, 4.12, 4.19, 4.22, 4.45
QuizWhen: Mon., July 11th, First 10 minutes of the classTopic: Performance, Chapter 4 Need: Paper, pen
30Pramod Argade UCSD CSE 141, Spring 2005
Datapath and Control Design
The Five Classic Components of a Computer
Control
Datapath
Memory
ProcessorInput
Output
31Pramod Argade UCSD CSE 141, Spring 2005
Combinational – Elements that operate on data values– Produces same output if given same inputs
State Elements– Contains internal storage– May produce different output if given same inputs– Clock is used to determine when a state element(s) should be
written
Two Types of Logic Components
CombinationalLogic
A
BC = f(A,B)
StateElement
clk
A
BC = f(A,B,state)
32Pramod Argade UCSD CSE 141, Spring 2005
Clock
Clock is a free running signal– Fixed cycle time (period)– Frequency = 1/(cycle time)– Duty Cycle: (% high)/(%low), e.g. 50/50 Duty Cycle below– Jitter: Uncertainty/wobble in rising or falling edge
Clock Cycle (Period)
Rising Edge Falling Edge
33Pramod Argade UCSD CSE 141, Spring 2005
Storage ElementsD Latch• Two inputs:
– the data value to be stored (D)– the clock signal (C) indicating when to read & store D
• Two outputs:– the value of the internal state (Q) and it's complement
Q
C
D
_Q
D
C
Q
Falling edge triggered D flip-flop• Output changes only on the clock edge
_Q
Q
_Q
D latch
D
C
D latch
DD
C
C
D
C
Q
34Pramod Argade UCSD CSE 141, Spring 2005
Edge-triggered ClockingValues stored in the machine are updated on a clock edge– The clock edge can be either rising or falling
By default a state element is written every clock edge– An explicit write control signal is required otherwise.
Edge triggered methodology allows, in the same clock cycle to:– read the contents of a register– send the value through some combinational logic, and – write the contents of the same or another register
Possible to have the same state element as input and output
Clock cycle
Stateelement
1Combinational logic
Stateelement
2
Clock cycle
Stateelement
1Combinational logic
Stateelement
1
35Pramod Argade UCSD CSE 141, Spring 2005
CPU: Clocking
Clk
Don’t CareSetup Hold
.
.
.
.
.
.
.
.
.
.
.
.
Setup Hold
All storage elements are clocked by the same clock edge
CLK CLK
36Pramod Argade UCSD CSE 141, Spring 2005
Register: A Storage Element– Similar to the D Flip Flop except
• N-bit input and output• Write Enable input
– Write Enable:• 0: Data Out will not change• 1: Data Out will become Data In (on the clock
edge)
Clk
Data In
Write Enable
N N
Data Out
37Pramod Argade UCSD CSE 141, Spring 2005
Register FileRegister File consists of (32) registers:– Two 32-bit output busses: busA and busB– One 32-bit input bus: busW
Register is selected by:– RA selects the register to put on busA– RB selects the register to put on busB– RW selects the register to be written
via busW when Write Enable is 1
Clock input (CLK)
Clk
busW
Write Enable
3232
busA
32busB
5 5 5RW RA RB
32 32-bitRegisters
38Pramod Argade UCSD CSE 141, Spring 2005
Memory
Memory– One input bus: Data In– One output bus: Data Out
Memory word is selected by:– Address selects the word to put on Data Out– Write Enable = 1: address selects the memory word to be written
via the Data In bus
Clock input (CLK) – The CLK input is a factor ONLY during write operation– During read operation, behaves as a combinational logic block:
Address valid => Data Out valid after “access time.”
Clk
Data In
Write Enable
32 32DataOut
Address
39Pramod Argade UCSD CSE 141, Spring 2005
Basic 4 x 2 Static RAM
Dlatch Q
D
C
Enable
Dlatch Q
D
C
Enable
Dlatch Q
D
C
Enable
Dlatch Q
D
C
Enable
Dlatch Q
D
C
Enable
Dlatch Q
D
C
Enable
Dlatch Q
D
C
Enable
Dlatch Q
D
C
Enable
2-to-4decoder
Write enable
Address
Din[0]Din[1]
Dout[1] Dout[0]
0
1
2
3
40Pramod Argade UCSD CSE 141, Spring 2005
Register Transfer Language (RTL)
Mechanism for describing the movement and manipulation of of data between storage elements.Example:R[rd] R[rs] + R[rt]PC PC + 4R[rt] Mem[R[rs] + immediate]
41Pramod Argade UCSD CSE 141, Spring 2005
A Simple Implementation of MIPS CPUSimplified to contain only:– Memory-reference instructions: lw, sw– Arithmetic-logical instructions: add, sub, and, or, slt
– Control flow instructions: beq, j
Execution Time = Instructions * CPI * Cycle TimeProcessor design (datapath and control) will determine:– Clock cycle time– Clock cycles per instruction
We will design a single cycle processor:– Advantage: One clock cycle per instruction– Disadvantage: long cycle time
42Pramod Argade UCSD CSE 141, Spring 2005
Arithmetic Instructions (R-Type)
ADD, SUB, AND, OR, SLTExampleadd rd, rs, rt
e.g. add $t3, $s0, $s5REG[$t3] = REG[$s0] + REG[$s5]
op rs rt rd shamt funct061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
s0 s5 t3
43Pramod Argade UCSD CSE 141, Spring 2005
Load/Store Instructions (I-Type)
LW, SWExampleslw rt, rs, imm16sw rt, rs, imm16
e.g. lw $s3, -4($s2)REG[$s3] = D-MEM[ REG[$s2] - 4 ]
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
s2 s3 -4
44Pramod Argade UCSD CSE 141, Spring 2005
Branch (I-Type)
BeqExamplebeq rs, rt, imm16
e.g.0x4c beq $s1, $t3, -12if( REG[$s1] == REG[$t3] ) {
new_PC = old_PC + 4 - 12 # new_PC = 0x44}else {
new_PC = old_PC + 4 # new_PC = 0x50}
op rs rt displacement016212631
6 bits 16 bits5 bits5 bits
s1 t3 -3
45Pramod Argade UCSD CSE 141, Spring 2005
Jump (J-Type)
JExampleJ Label
e.g.0x8000 0000 j Label…
Label: 0x8444 4444 ADD $s3, $s5, $t1…
new_PC = 0x8444 4444
op target address02631
6 bits 26 bits
0x111 1111
46Pramod Argade UCSD CSE 141, Spring 2005
Single Cycle Implementation Datapath and Control
InstructionFetch
InstructionDecode
OperandFetch
Execute
ResultStore
NextInstruction
I. Fe
tch
Dec
ode
Op.
Fet
ch
Exec
ute
Stor
e
Nex
t PC
Clock Cycle
Complete Execution of a Single Instruction
47Pramod Argade UCSD CSE 141, Spring 2005
Components Required to implement the ISANext PC generation– Add 4 or extended 16-bit immediate to PC
Memory– Instruction read– Data read/write
Registers (32 x 32-bit)– Read register rs– Read register rt– Write register rt or rd
Sign extend immediate operandALU to operate on the operands
48Pramod Argade UCSD CSE 141, Spring 2005
Abstract / Simplified View:
Datapath
RegistersRegister #
Data
Register #
Data memory
Address
Data
Register #
PC Instruction ALU
Instruction memory
Address
49Pramod Argade UCSD CSE 141, Spring 2005
CPU: Instruction Fetch
• RTL version of the instruction fetch step: • Fetch the Instruction: mem[PC]– Update the program counter:
• Sequential Code: PC PC + 4 • Branch and Jump: PC “something else”
32Instruction Word
Address
InstructionMemory
PCClk
4
Adder
50Pramod Argade UCSD CSE 141, Spring 2005
CPU: Register-Register Operations (Add, Subtract etc.)
R[rd] R[rs] op R[rt] Example: addU rd, rs, rt– Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields– ALUctr and RegWr: control logic after decoding the instruction
32Result
ALUctr
Clk
busW
RegWr
3232
busA
32busB
5 5 5
Rw Ra Rb32 32-bitRegisters
Rs RtRdA
LU
op rs rt rd shamt funct061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
51Pramod Argade UCSD CSE 141, Spring 2005
CPU: Load Operations
R[rt] Mem[R[rs] + SignExt[imm16]] Example: lw rt, rs, imm16
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
Clk
busW
RegWr
3232
busA
32busB
5 5 5
Rw Ra Rb32 32-bitRegisters
Rs
RtRd
RegDst
Extender
Mux
Mux
3216
imm16
ALUSrc
ExtOp
Clk
Data InWrEn
32
Adr
DataMemory
32
32
ALUctr
ALU
MemWr Mux
W_Src
Rt
52Pramod Argade UCSD CSE 141, Spring 2005
CPU: Store Operations
Mem[ R[rs] + SignExt[imm16] R[rt] ] Example: sw rt, rs, imm16
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Extender
Mux
Mux
3216imm16
ALUSrcExtOp
Clk
Data InWrEn
32Adr
DataMemory
MemWr
ALU
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
32
Mux
W_Src
53Pramod Argade UCSD CSE 141, Spring 2005
CPU: Datapath for Branching
beq rs, rt, imm16 Datapath generates condition (equal)
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
32
imm16
PC
Clk
00
Adder
Mux
Adder
4nPC_sel
Clk
busW
RegWr
32
busA
32busB
5 5 5
Rw Ra Rb32 32-bitRegisters
Rs Rt
Equa
l?
Cond
PC Ext
Instruction Address
Sign extend to 32 bits and left shift by 2
54Pramod Argade UCSD CSE 141, Spring 2005
CPU: Binary arithmetic for PC
• In theory, the PC is a 32-bit byte address into the instruction memory:– Sequential operation: PC<31:0> = PC<31:0> + 4– Branch operation: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] * 4
• The magic number “4” always comes up because:– The 32-bit PC is a byte address– And all our instructions are 4 bytes (32 bits) long
• In other words:– The 2 LSBs of the 32-bit PC are always zeros– There is no reason to have hardware to keep the 2 LSBs
• In practice, we can simplify the hardware by using a 30-bit PC<31:2>:– Sequential operation: PC<31:2> = PC<31:2> + 1– Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16]– In either case: Instruction Memory Address = PC<31:2> concat “00”
55Pramod Argade UCSD CSE 141, Spring 2005
Single Cycle Implementation
Putting it all together– Supports Arithmetic, Load/Store and Branch instructions
MemtoReg
MemRead
MemWrite
ALUOp
ALUSrc
RegDst
PC
Instruction memory
Read address
Instruction [31– 0]
Instruction [20– 16]
Instruction [25– 21]
Add
Instruction [5– 0]
RegWrite
4
16 32Instruction [15– 0]
0Registers
Write registerWrite data
Write data
Read data 1
Read data 2
Read register 1Read register 2
Sign extend
ALU result
Zero
Data memory
Address Read data M
u x
1
0
M u x
1
0
M u x
1
0
M u x
1
Instruction [15– 11]
ALU control
Shift left 2
PCSrc
ALU
Add ALU result
56Pramod Argade UCSD CSE 141, Spring 2005
AnnouncementsDiscussions Sections for 141: Thursday, 7:30 - 8:30 WLH-2205
Office Hour:– Tue. 7:30 - 9:00 (Center 105)– Wed. 5:00 - 6:00 PM (HSS 1330)– By Appointment
Reading Assignment– Chapter 4. Assessing and Understanding Performance
Sections 4.1 - 4.6Homework 3: Due Mon., July 11th in class4.1, 4.2, 4.6, 4.7, 4.8, 4.9, 4.12, 4.19, 4.22, 4.45
QuizWhen: Mon., July 11th, First 10 minutes of the classTopic: Performance, Chapter 4 Need: Paper, pen