View
221
Download
0
Category
Tags:
Preview:
Citation preview
ENG241 Digital Design
Week #9 Register Transfer and Data Paths
Fall 2014 ENG241/Digital Design 2
Week #9 Topics
Data Paths and Operations The Arithmetic/Logic Unit
Register Transfer Operations Micro-Operations
Multiplexer-Based Transfer Bus-Based Transfer Complete Data Path Design Pipelining
Fall 2014 ENG241/Digital Design 3
Resources
Chapter #7, Mano Sections 7.2 Register Transfers 7.3 Register Transfer Operations 7.4 VHDL and RTL 7.5 Micro Operations 7.6 Multiplexer Based Transfers 7.8 Bus Based Transfers
Fall 2014 ENG241/Digital Design 4
Parts of CPUs Datapath
Registers, Multiplexors, Adders, Subtractors and logic to perform operations on them (Comb Logic)
Control unit Generates signals to control data-path Accepts status signals to perform sequencing
Control Data Path
Fall 2014 ENG241/Digital Design 5
Memory and I/O
Control Unit + Data Path + Memory + Input Output = Micro-Micro-computer Systemcomputer System
MEMORYInput and Output
Fall 2014 ENG241/Digital Design 6
Arithmetic/Logic Unit (ALU)
The ALU is a combinational circuit that performs a set of basic arithmetic and logic operations. An adder can perform
addition, subtraction, … Select lines are used to
determine the operation to be performed.
Fall 2014 ENG241/Digital Design 7
ALU Design using Hierarchy
This ALU has: 2 control lines S0,S1
for arithmetic S2 selects logical ops
Start designing in parts
Fall 2014 8
One Stage ALU Design a 1-bit Arithmetic unit Design a 1-bit Logic unit Combine the two units to form a 1-bit Arithmetic/Logic Replicate as many times to form an n-bit ALU
ENG241/Digital Design
Fall 2014 ENG241/Digital Design 9
Arithmetic Circuit
The basic component of an arithmetic circuit is a: N-bit Ripple Carry Adder (Parallel Adder). By controlling the data inputs to the parallel adder, it is
possible to obtain different types of arithmetic operations (Cin is also an input)
Select lines S0, S1 can be used to control input Y. Why?
Fall 2014 ENG241/Digital Design 10
Looking Inside
Table Functionality. How to design the B
Input Logic?
B InputLogic
What possible functionality can I achieve if I control the ‘Y’ Value to the n-bit Adder?
Fall 2014 ENG241/Digital Design 11
Design of B Select Logic Use an 8-to-1 Mux (Straight forward Solution). Or … use a 4-to-1 mux! Can we do better? YES: simplify the expression from the truth table
using a K-Map
Fall 2014 ENG241/Digital Design 12
1-bit (Single Stage) Arithmetic Circuit
The B logic is nothing but a 2-to-1 Mux instead of the 4-to-1 Mux
Fall 2014 ENG241/Digital Design 13
4-Bit Circuit
Duplicating the one stage four times will produce a 4-bit circuit
Fall 2014 ENG241/Digital Design 14
Logic Section Design
Generous number of operations
Fall 2014 ENG241/Digital Design 15
Arithmetic/Logic Unit
The logic circuit can be combined with the arithmetic circuit to produce an ALU.
I. Selection variables S1 and S0 can be commoncommon to both circuits to both circuits,
II. A third selection variable S2 can be used to differentiate between the logic and arithmetic operations.
Fall 2014 ENG241/Digital Design 16
One Stage Arithmetic Circuit
Fall 2014 ENG241/Digital Design 17
One Stage Logic Circuit
Fall 2014 ENG241/Digital Design 18
One Stage ALU
Mux to choose Arithmetic or Logic
Fall 2014 ENG241/Digital Design 19
n-bit ALU
Duplicate the one stage n times!!
Fall 2014 ENG241/Digital Design 20
Resulting Control The one stage ALU can provide
I. 8 arithmetic, and II. 4 logic operations.
Register Transfer Language (RTL) Register Transfer Language (RTL): used to
describe CPU organization in high-level terms RTL expressions are made up of elements
which describe the registers being manipulated, and the micro-ops being performed on them
Here are the basic components of RTL expressions:
Fall 2014 ENG241/Digital Design 21
Fall 2014 ENG241/Digital Design 22
Register Transfer Language (RTL)
Registers named in uppercase PC, IR (instruction), R3
The operations on the data in registers are called microoperations
Fall 2014 ENG241/Digital Design 23
Micro-Operations
Basic operations of the datapath Example:
1. Moving data from one register to another2. Adding the contents of two registers3. Incrementing the contents of a register
The control unit provides the signals that sequence the micro-operations in a prescribed manner
The results of a currently executing micro-operation may determine both the sequence of control signals and the sequence of future micro-operations to be executed (e.g. BNE)
A micro operation is expected to complete in one clock
Fall 2014 ENG241/Digital Design 24
RTL
Transfer from R1 to R2 R2 R1
1. R2 is destination2. R1 is source
Conditional If(K1 = 1) then (R2 R1)
K1: R2 R1 as a shorter form
Fall 2014 ENG241/Digital Design 25
Transfer
K1: R2 R1 Transfer at the clock edge When K1 is high n bits wide
Fall 2014 ENG241/Digital Design 26
Symbols
Note memory transfers DR M[AR] (contents of Memory)
Fall 2014 ENG241/Digital Design 27
Syntax not VHDL (similar)
Fall 2014 ENG241/Digital Design 28
Types of Microoperations
1. Transfer – (have just looked at)2. Arithmetic3. Logic4. Shift
Fall 2014 ENG241/Digital Design 29
Arithmetic
Basic ops (addition, subtraction, ..) R0 R1 + R2
Subtraction by 2’s complement
Fall 2014 ENG241/Digital Design 30
Notation is Shorthand for Hardware
Consider and
211:1 RRRKX 1211:1 RRRXK
Note overflow and carry
registers
Fall 2014 ENG241/Digital Design 31
Logic Microoperations
OR notation a little confusing
shows two types of syntax for ORs211:)21( RRRKK
Fall 2014 ENG241/Digital Design 32
Shift Microoperations
Here just the basic one-bit shifts
Bit falls off the end, zero shifted in
Fall 2014 ENG241/Digital Design 33
Multiplexer-Based Transfers
There are occasions when a register receives data from two or more different sources at different times.
Recall that multiplexers are used to conditionally transfer values from the input to the output.
Fall 2014 ENG241/Digital Design 34
Multiplexer-Based Transfers
Consider
Which can also be expressed as
Block diagram?
20)12()10()11( RRthenKifelseRRthenKif
20:21,10:1 RRKKRRK
Fall 2014 ENG241/Digital Design 35
Multiplexer Block Diagram
20:21,10:1 RRKKRRK
Fall 2014 ENG241/Digital Design 36
Detailed
Fall 2014 ENG241/Digital Design 37
Bus-Based Transfers
How about when there are lots of registers?
We can use buses and send data over common set of wires Busses are more efficient scheme for
transferring data between registers!
Fall 2014 ENG241/Digital Design 38
Bus-Based Transfers
A Bus is a shared transfer path. It is characterized by a set of common lines
(i) Data + (ii) Control, (iii) Status The control signals for the logic select a
single source and one or more destinations on any clock cycle.
SRC1
SRC2
DEST1
DEST2
Fall 2014 ENG241/Digital Design 39
Simple Case: using Muxes!
Signals S1, S0 select the source
Signals L0, L1, L2 enable loading of the registers.
The single bus (on the right) can achieve more transfers than system on the left! One mux One output bus
Fall 2014 ENG241/Digital Design 40
Transfers
Only single source About ½ the hardware Select/Load Signals (table) Limitations!
Fall 2014 ENG241/Digital Design 41
Three-State Bus
Remember three-state drivers allow having multiple outputs share wire Note the small inverted triangle
denotes the 3-state output of the register.
A bus can be constructed with the three state buffers.
Many three state buffer outputs can be connected together to form a bit line of a bus less delay less delay than multiplexer based
systems
Fall 2014 ENG241/Digital Design 42
Same Example with 3-State
Notice that both systems in the figure have the same capability in term of transfers.
However the 3-state bus has:
1.1. Fewer wiresFewer wires2.2. Easier to expandEasier to expand!
Fall 2014 ENG241/Digital Design 43
Memory Transfers
Usually one or more buses associated with memory Address Data
Note that memory can be slower, so may have to use complex timing Address on one clock cycle Data latched at later clock cycle
Fall 2014 ENG241/Digital Design 44
Properties of Memory
1.1. VolatileVolatile Memory disappears if power goes out
Typical computer RAM Static RAM (SRAM), Dynamic RAM (DRAM)
2.2. NonvolatileNonvolatile ROM Flash memories Magnetic memories like disk, tape
Fall 2014 ENG241/Digital Design 45
Simple View of RAM
Of some word size n Some capacity 2k
k bits of address line A read line A write line
Fall 2014 ENG241/Digital Design 46
Memory Transfer
Read: DR M[AR] where M denotes Memory, DR denotes Data RegisterData Register, and AR denotes Address RegisterAddress Register
Write: M[AR] DR Write: M[A1] D2
Fall 2014 ENG241/Digital Design 47
Memory Transfer
Fall 2014 ENG241/Digital Design 48
Data Paths --> ALU + Storage
Computer Systems often employ a number of storage elements in conjunction with a shared operation unit called an Arithmetic/Logic Unit (ALU) to form data path.
To perform a micro operation, the contents of a specified source registers are applied to the inputs of the shared ALU.
The ALU performs an operation, and the result of this operation is transferred to a destination register.
Fall 2014 ENG241/Digital Design 49
Data Paths, single clock cycle
Since the ALU is designed as a pure combinational circuit, the entire register transfer operation from the source registers, through the ALU, and into the destination register is performed in one clock cycle.
Fall 2014 ENG241/Digital Design 50
Datapath
A Simple bus-based data path: four registers, an ALU, and a shifter.
Each register is connected to two multiplexers to form ALU input buses A and B (Register File)
Another Mux is used to choose between Registers and a constant.
Functional Unit: ALU and a shifter
Fall 2014 ENG241/Digital Design 51
Datapath
Blue signals are generated by control
Decoder along with the Load-enable signal determines the destination Register (R0,R1,R2,R3)
Fall 2014 ENG241/Digital Design 52
Datapath
MB Select determines if the source B is a Register or Constant.
G Select determines the operation to be performed by ALU.
MF Select determines if the output is the ALU or Shifter
MD Select determines if the input to the Register File is the Function Unit or external Data.
Fall 2014 ENG241/Digital Design 53
Datapath
Four status bits are shown (V,C,N,Z) that can be used by the control unit
It is useful to have certain information based on the results of an ALU operation available for use by the control unit to make decisions.???
Make Corrections Skip an instruction Loops If/Else Statements …
Fall 2014 ENG241/Digital Design 54
Example: R1R2+R3
Signals? A, B select MB Select G Select MF Select MD Select Destination (D) Load enable
What about timing?
Fall 2014 ENG241/Digital Design 55
Timing
All can occur in one clock, but Signals must be available in time to
propagate through muxes, ALU and Be at Register inputs by next pos-edge
Fall 2014
Datapath
Higher-level view for hierarchical design
Can replace modules with same interface but different implementation
Fall 2014 ENG241/Digital Design 57
Performance Improvement
In addition to providing a data path that performs the necessary register transfer micro operations, we need to be concerned about the speed or rate at which the micro operations are performed. How?
I. First we need to know the maximum speed by which our data path can be run.
II. Then we will explore how we can make it faster. (Pipelining)
Fall 2014 ENG241/Digital Design 58
PipeliningPipelining
Pipelining exploits parallelism at the instruction level.
Pipelining is an implementation technique in which multiple instructions are overlapped in execution.
Today pipelining is key to making processors fast.
Fall 2014 ENG241/Digital Design 59
Pipelining: ExamplePipelining: Example
Laundry Ann, Brian, Cathy, Dave
each have one load of clothes to wash, dry, and fold
Washer takes 3030 minutes
Dryer takes 40 40 minutes
“Folder” takes 2020 minutes
A B C D
Fall 2014 ENG241/Digital Design 60
Sequential LaundrySequential Laundry
Sequential laundry takes (90 x 4 = 360 minutes) 6 hours for 4 loads If they learned pipelining, how long would laundry take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
Fall 2014 ENG241/Digital Design 61
Pipelining LessonsPipelining Lessons
Tot Time: 210 minutes!! versus 360 with no pipelining
Potential speedup = Number pipe stages
Unbalanced lengths of pipe stages reduces speedup
Time to “fill” pipeline and time to “drain” it reduces speedup
Pipelining doesn’t help latency of single task, it helps throughput of entire workload
A
B
C
D
6 PM 7 8 9
Task
Order
Time
30 40 40 40 40 20
Fall 2014 ENG241/Digital Design 62
Assembly Line Analogy to Data Path Pipeline
A custom product being built may pass the assembly line many times before it is completed.
A conveyor belt moves components from stage to stage
This technique increases throughput
Fall 2014 ENG241/Digital Design 63
Conventional Data Path Timing
The figure shows the maximum delay values for each of the components of a typical data path:
1. 4ns (3ns + 1ns) to read two operands from register file.
2. 4ns to perform an operation.3. 4ns (1ns + 1ns) to write info
back Total 12 ns to perform a
single micro operation. The rate of execution is then set
at 1/12ns = 83.3MHz Can we make it faster?
Fall 2014 ENG241/Digital Design 64
Pipelined Data Path Timing
We can break the delay of 12ns by inserting registers between the different components of the system.
A register is inserted between the function unit and the register file (OF)
Another register can be inserted between the function unit and MUX D. (EX + WB)
3 stage pipeline: OF / EX / WB
The maximum delay now is 5ns allowing a maximum clock frequency of 200 MHz
Fall 2014 65
Pipelining
3 Stages Operand Fetch Execute Write Back
Fall 2014 ENG241/Digital Design 66
Pipelining
Conventional data path 7 x 12ns = 84ns Pipelined data path 9 x 5ns = 45ns
Fall 2014 ENG241/Digital Design 67
Summary
Data PathsData Paths are an essential part of any CPU. ALUsALUs (Arithmetic Logic Units) are at the
heart of any Data Path. MultiplexorsMultiplexors and Tri-State buffers Tri-State buffers are used
extensively in Data Paths (data movement) PipeliningPipelining is a technique to improve
throughputthroughput by overlapping instruction execution.
Fall 2014 ENG241/Digital Design 68
Recommended