Computer Systems Organization and Architecture Topic 3: Processor Design

CPU must:◦ Fetch instructions

(suruhan ambil)◦ Interpret _______ (tafsir

_______)◦ _______ data (_______ data)◦ Process data (proses data)◦ Write data (tulis data)

Registers form the highest level of the memory hierarchy (hierarki ingatan)◦ Small set of high speed storage locations◦ ______ storage for data and control information

Two types of registers◦ User-visible

May be referenced by assembly-level instructions (suruhan paras perhimpunan) and are thus “_______” to the user

◦ Control (kawalan) and _______ registers Used to control the operation of the CPU Most are not visible to the user

General categories based on function◦ General purpose (Serba guna)

Can be assigned a variety of functions Ideally, they are defined _______ to the operations

within the instructions◦ _______

These registers only hold data◦ Address (Alamat)

These registers only hold _______ information Examples: general purpose address registers,

segment pointers, stack pointers, index registers◦ _______ codes (Kod _______)

Visible to the user but values set by the CPU as the result of performing operations

Example code bits: _______, _______, overflow (limpahan)

Bit values are used as the basis for conditional jump instructions (suruhan lompat bersyarat)

Design trade off (tukar ganti) between general purpose and specialized registers◦ General purpose registers _______ flexibility in

instruction design◦ _______ purpose registers permit implicit register

specification in instructions - reduces register field size in an instruction

◦ No clear “best” design approach How many registers are enough?

◦ More registers permit more operands (kendalian) to be held within the CPU - reducing memory bandwidth requirements to some extent

◦ More registers cause an _______ in the field sizes needed to specify registers in an instruction word

◦ Locality of reference may not support too many registers

◦ Most machines use _______registers

How big (wide)?◦ Address registers should be _______ enough to hold the

longest address◦ Data registers should be wide enough to hold most data

types Would not want to use _______-bit registers if the vast

majority of data operations used 16 and 32-bit operands

Related to width of memory _______ bus Concatenate registers together to store longer formats

B-C registers in the 8085 AccA-AccB registers in the 68HC11

These registers are used during the _______, decoding (penyahkodan) and _______ of instructions◦ Many are not visible to the user / programmer◦ Some are visible but can not be (easily) modified

Typical registers◦ _______ counter (PC)

Points to the next instruction to be executed◦ _______ register (IR)

Contains the instruction being executed (most recently)

◦ Memory _______ register (MAR) Contains the address of a location in memory

◦ Memory _______ / _______ register (MBR) Contains a word of data to be written to memory or

the word most recently read◦ Program _______ word(s)

Superset of condition code register Interrupt masks, supervisory modes, etc. Status information

A set of bits Includes Condition Codes _______

◦ Contains the sign of the result of the last arithmetic operation _______

◦ Set when the result is 0 _______

◦ Set if an operation resulted in a carry (addition) into or borrow (subtraction) out of a high-order bit

_______◦ Set if a logical compare result is equality

_______◦ Used to indicate arithmetic overflow

Interrupt enable/disable◦ Used to enable or disable interrupts

Supervisor◦ Indicates whether the CPU is executing in supervisor or user

mode

_______ Cycle◦ May require memory access

to fetch operands◦ _______ addressing requires

more memory accesses◦ Can be thought of as

additional instruction ________

Depends on CPU design In general:

_______◦ PC contains _______ of next instruction◦ Address moved to _______◦ Address placed on address bus◦ Control unit requests memory read◦ Result placed on _______ bus, copied to MBR, then to IR◦ Meanwhile PC _______ by 1

IR is examined If indirect addressing, indirect cycle is _______

◦ Right most N bits of _______ transferred to _______◦ Control unit requests memory _______◦ Result (address of _______) moved to MBR

May take many forms Depends on _______ being executed May include

◦ _______ read/write◦ Input/Output◦ _______ transfers◦ _______ operations

_______ _______ Current PC saved to allow resumption after interrupt Contents of PC copied to MBR Special memory location (e.g. _______ pointer) loaded to

MAR MBR written to _______ PC loaded with address of interrupt handling routine Next instruction (first of _______ handler) can be fetched

Prefetch◦ Fetch accessing main _______◦ Execution usually does not _______ main memory◦ Can fetch next instruction during execution of current

instruction◦ Called instruction _______

Improved Performance◦ But not doubled:

Fetch usually _______ than execution Prefetch more than one instruction?

Any jump or _______ means that prefetched instructions are not the required instructions

◦ Add more _______ to improve performance

The Central Processing Unit (CPU) is the _______ combination (kombinasi lojik) of the _______ _______ _______ (ALU) and the system’s control unit

In this sub-section, we focus on the ALU and its operation◦ Overview of the ALU◦ Data representation (Perwakilan data)◦ Computer Arithmetic and its hardware implementation

The ALU is that part of the computer that actually performs _______ and _______ operations on data

All other elements of the computer system are there mainly to bring _______ to the ALU for processing or to take _______ from the ALU

Registers are used as _______ and _______ for most ALU operations

In early machines, _______ and _______ determined the overall structure of the CPU and its ALU◦ Result was that machines were built around a single

register, known as the __________ (penumpuk)◦ The __________ was used in almost all ALU related

_________

The _______ and _______of the CPU and the ALU is improved through increases in the complexity of the hardware◦ Use _______ register sets to store operands, addresses

and results◦ _______ the capabilities of the ALU◦ Use special hardware to support _______ of execution

between points in a program◦ _______ functional units within the ALU to permit

concurrent operations Problem: design a minimal cost yet fully functional ALU

◦ What building block components would be included?

Solution:◦ Only 2 basic _______ are required to produce a fully

functional ALU A bit-wide _______ _______ unit A 2-input _______ gate

◦ NAND is a functionally complete logic operation◦ Similarly, if you can add, all other arithmetic operations

can be derived from addition.◦ To conduct operations on _______ bit words is clearly

tedious (menjemukan)!◦ Goal then is to develop arithmetic and logic circuitry that

is algorithmically _______ while remaining cost effective

_______-_______ format◦ Positional representation using n bits◦ Left most bit position is the sign bit

0 for _______ number 1 for _______ number

◦ Remaining n-1 bits represent the _______◦ Range: {-2n-1-1, +2n-1-1}◦ Problems:

Sign must be considered during arithmetic operations Dual representation of zero (-0 and +0)

Ones ______________ format◦ Binary case of diminished (menyusut) _______

complement ◦ Negative numbers are represented by a bit-by-bit

______________ of the (positive) magnitude (the process of negation)

◦ Sign bit interpreted as in sign-magnitude format◦ Examples (8-bit words):

+42 = 0 00101010- 42 = 1 11010101

◦ Still have a _______ representation for zero (all zeros and all ones)

Twos ______________ format◦ Binary case of radix complement◦ Negative numbers, -X, are represented by the pseudo-

positive number 2n - |X|◦ With 2n symbols

2n-1-1 _______ numbers 2n-1 _______ numbers

◦ Given the representation for +X, the representation for -X is found by taking the 1s complement of +X and adding 1

◦ Caution: avoid confusion with “2s complement _______ (representation) and the 2s complement _______

◦ Converting between two word lengths (e.g., convert an 8-bit format into a 16-bit format) requires a sign extension: The _______ bit is extended from its current location up

to the new location All bits in the extension take on the value of the old

_______ bit

+18= 00010010+18= 00000000 00010010

-18= 11101110-18= 11111111 11101110

Use of a single _______ adder is the simplest hardware◦ Must implement an n-repetition for-loop for an n-bit

addition◦ This is lots of _______ for a typical addition

Use a _______ adder unit instead◦ n full adder units cascaded together◦ In adding X and Y together unit i adds Xi and Yi to

produce SUMi and CARRYi◦ Carry out of each stage is the carry in to the next stage◦ Worst case add time is n times the delay of each unit --

despite the _______ operation of each adder unit -- Order (n) delay

◦ With signed numbers, watch out for _______: when adding 2 positive or 2 negative numbers, _______ has occurred if the result has the _______ sign

Alternatives to the ripple adder◦ Must allow for the worst case delay in a ripple adder◦ In most cases, _______ signals do not propagate through

the entire adder◦ Provide additional hardware to detect where carries will

occur or when the carry _______ is completed◦ Carry Completion Sensing Adders use additional circuitry

to detect the time when all carries are completed Signal control unit that add is finished Essentially an ______________ device Typical add times are O(log n)

◦ Carry ___________ Adders Predict in advance what adder stage of a ripple adder

will generate a carry out Use prediction to avoid the carry propagation delays --

generate all of the carries at once Add time is a _______, regardless of the width, n, of the

word -- O(1) Problem: prediction in stage i requires information from

all previous stages -- gates to implement this require large numbers of inputs, making this adder impractical for even moderate values of n

To perform X-Y, realize that X-Y = X+(-Y)

Therefore, the following hardware is “typical”

A number of methods exist to perform integer multiplication◦ Repeated _______: add the multiplicand to itself

“multiplier” times◦ Shift and add -- traditional “pen and paper” way of

multiplying (extended to binary format)◦ High speed (special purpose) hardware multipliers

_______ addition◦ Least sophisticated method◦ Just use adder over and over again◦ If the multiplier is n bits, can have as many as 2n

iterations of addition -- O(2n) !!!!◦ Not used in an _______

Shift and add◦ Computer’s version of the pen and paper approach:

1011 (11)x 1101 (13)

===========1011

00000 Partial products 101100 1011000

=========== 10001111 (143)

◦ The computer version accumulates the partial products into a running (partial) sum as the algorithm progresses

◦ Each partial product generation results in an _______ and _______ operation

Shift and add hardware for unsigned integers

Shift and add flowchart for unsigned integers

To multiply signed numbers (2s ____________)◦ Normal shift and add does not work (problem in the

basic algorithm of no sign extension to 2n bits)◦ ________ all numbers to their positive magnitudes,

multiple, then figure out the correct sign◦ Use a method that works for both positive and negative

numbers ________ algorithm is popular (recoding the multiplier)

◦ ________ algorithm As in S&A, strings of 0s in the ________ only require

shifting (no addition steps) “Recode” strings of 1s to permit similar ________ String of 1s from 2u down to 2v is treated as 2u+1- 2v

In other words,- At the right end of a string of 1s in the multiplier, perform a ________- At the left end of the string perform an ________- For all of the 1s in between, just do

________ Hardware modifications required in (Figure shift and

add hardware for unsigned integers)- Ability to perform ________- Ability to perform ________ shifting rather than logical shifting (for sign extension)- A flip flop for bit Q-1

To determine ________ (add and shift, subtract and shift, shift) examine the bits Q0Q-1

- 00 or 11: just shift- 10: ________ and shift- 01: ________ and shift

Booth’s algorithm for multiplication

Advantages of Booth:- Treats positive and negative numbers

________- Strings of 1s and 0s can be skipped over

with shift operations for faster ________ time High performance multipliers

◦ ________ the computation time by employing more hardware than would normally be found in a S&A-type multiplier unit

◦ Not generally found in general-purpose processors due to expense

◦ Examples Combinational hardware multipliers Pipelined Wallace Tree adders from Carry-Save Adder

units

Once you have committed to implementing multiplication, implementing division is a relatively easy next step that utilizes much of the same hardware

Want to find quotient, Q, and remainder, R, such thatD = Q x V + R

Restoring division for ________ integers◦ Algorithm adapted from the traditional “pen and paper”

approach◦ Algorithm is of time complexity O(n) for n-bit dividend◦ Uses essentially the same ALU hardware as the ________

multiplication algorithm Adder / subtractor unit ________ wide shift register AQ that can be shifted to the

left ________ for the divisor Control logic

Restoring division algorithm for unsigned integers

For two’s complement numbers, must deal with the ________ extension “problem”

Algorithm:◦ Load M with divisor, AQ with dividend (using sign bit

extension)◦ ________ AQ left 1 position◦ If M and A have same sign, AA-M, otherwise AA+M◦ Q01 if sign bit of A has not changed or (A=0 AND

Q=0), otherwise Q0=0 and restore *A◦ Repeat ________ and +/- operations for all bits in Q◦ Remainder is in A, quotient in Q

If the signs of the divisor and the dividend were the same, quotient is correct, otherwise, Q is the 2’s complement of the quotient

2’s complement division examples

________ fixed point schemes do not have the ability to represent very large or very small numbers

Need the ability to dynamically ________ the decimal point to a convenient location

Format: +/-M x R +/-E

Significand / mantissas are stored in a ________ format◦ Either 1.xxxxx or 0.1xxxxx◦ Since the 1 is required, don’t need to explicitly store it in

the data word -- insert it for calculations only Exponents can be positive or negative values

◦ Use ________ (Excess coding) to avoid operating on negative exponents

◦ ________ is added to all exponents to store as positive numbers

For a fixed n-bit representation length, 2n combinations of symbols◦ If floating point ________ the range of numbers in the

format (compared to integer representation) then the “spacing” between the numbers must increase This causes a ________ in the format’s precision

◦ If more bits are allocated to the exponent, range is ________ at the expense of decreased precision

◦ Similarly, more significand bits increases the ________ and reduces the range

◦ The ________ is chosen at design time and is not explicitly represented in the format Small -- smaller range Large -- increased range but loss of significant bits as

a result of mantissa alignment when normalizing

Problems to deal with in the format◦ Representation of ________◦ Over and ________ and how to detect◦ ________ operations

IEEE 754 format◦ Defines single and double ________ formats (32 and

64 bits)◦ Standardizes formats across many different

platforms◦ Radix 2◦ Single

Range 10-38 to 10+38

8-bit exponent with 127 bias 23-bit mantissa

◦ Double Range 10-308 to 10+308

11-bit exponent with 1023 bias 52-bit mantissa

IEEE 754 Formats

Floating point arithmetic operations◦ Addition and subtraction

________ significand Add or subtract significand Post ________

◦ Multiplication ________ exponents Multiply significand Post normalize

◦ Division ________ exponents Divide significand Post normalize

In this section, we have focused on the operation of the CPU◦ Registers and their use◦ Instruction execution

Looked at the basicd concepts associated with computer arithmetic◦ Number representation◦ Basic ALU construction◦ Hardware and software implementations of multiplication

and division operations◦ Floating point numbers and operations

Computer Organization and Architecture, 6th Edition. Stallings, W. Prentice Hall.

Computer Organization and Design. David A. Patterson, John L. Hennessy. Morgan Kaufmann

Documents

Computer Systems Organization and Architecture Topic 3: Processor Design