49
1 ECE369 ECE 369 Chapter 3

ECE369: Fundamentals of Computer Architecture

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECE369: Fundamentals of Computer Architecture

1ECE369

ECE 369

Chapter 3

Page 2: ECE369: Fundamentals of Computer Architecture

2ECE369

Our Goal

Page 3: ECE369: Fundamentals of Computer Architecture

3ECE369

Lets Build a Processor, Introduction to Instruction Set Architecture

• First Step Into Your Project !!!• How could we build a 1-bit ALU for add, and, or?

• Need to support the set-on-less-than instruction (slt)

– slt is an arithmetic instruction

– produces a 1 if a < b and 0 otherwise

– use subtraction: (a-b) < 0 implies a < b

• Need to support test for equality (beq $t5, $t6, $t7)

– use subtraction: (a-b) = 0 implies a = b

• How could we build a 32-bit ALU? 32

32

32

operation

result

a

b

ALU

Must Read Appendix B

Page 4: ECE369: Fundamentals of Computer Architecture

4ECE369

One-bit adder

• Takes three input bits and generates two output bits• Multiple bits can be cascaded

cout = a.b + a.cin + b.cinsum = a <xor> b <xor> cin

Page 5: ECE369: Fundamentals of Computer Architecture

5ECE369

Building a 32 bit ALU

b

0

2

Result

Operation

a

1

CarryIn

CarryOut

R e su lt31a3 1

b3 1

R e su lt0

C arryIn

a0

b0

R e su lt1a1

b1

R e su lt2a2

b2

O pe ra t io n

A LU 0

C arry In

C arry O u t

A LU 1

C arry In

C arry O u t

A LU 2

C arry In

C arry O u t

A LU 3 1

C arry In

Page 6: ECE369: Fundamentals of Computer Architecture

6ECE369

• Two's complement approach: just negate b and add.• How do we negate?

• A very clever solution:

What about subtraction (a – b) ?

b

0

2

Result

Operation

a

1

CarryIn

CarryOut

000 = and001 = or010 = add

0

2

Result

Operation

a

1

CarryIn

CarryOut

0

1

Binvert

b000 = and001 = or010 = add110 = subtract

Page 7: ECE369: Fundamentals of Computer Architecture

7ECE369

Supporting Slt

• Can we figure out the idea?

000 = and001 = or010 = add110 = subtract111 = slt

Page 8: ECE369: Fundamentals of Computer Architecture

8ECE369

Seta31

0

Result0a0

Result1a1

0

Result2a2

0

Operation

b31

b0

b1

b2

Result31

Overflow

Bnegate

Zero

ALU0Less

CarryIn

CarryOut

ALU1Less

CarryIn

CarryOut

ALU2Less

CarryIn

CarryOut

ALU31Less

CarryIn

Test for equality

• Notice control lines

000 = and001 = or010 = add110 = subtract111 = slt

• Note: Zero is a 1 if result is zero!

Page 9: ECE369: Fundamentals of Computer Architecture

9ECE369

How about “a nor b”

000 = and001 = or010 = add110 = subtract111 = slt

Page 10: ECE369: Fundamentals of Computer Architecture

10ECE369

Big Picture

Page 11: ECE369: Fundamentals of Computer Architecture

11ECE369

Conclusion

• We can build an ALU to support an instruction set– key idea: use multiplexor to select the output we want– we can efficiently perform subtraction using two’s complement– we can replicate a 1-bit ALU to produce a 32-bit ALU

• Important points about hardware– all of the gates are always working– speed of a gate is affected by the number of inputs to the gate– speed of a circuit is affected by the number of gates in series

(on the “critical path” or the “deepest level of logic”)• Our primary focus: comprehension, however,

– Clever changes to organization can improve performance(similar to using better algorithms in software)

• How about my instruction smt (set if more than)???

Page 12: ECE369: Fundamentals of Computer Architecture

12ECE369

ALU Summary

• We can build an ALU to support addition• Our focus is on comprehension, not performance• Real processors use more sophisticated techniques for arithmetic• Where performance is not critical, hardware description languages

allow designers to completely automate the creation of hardware!

Page 13: ECE369: Fundamentals of Computer Architecture

13ECE369

Overflow

Page 14: ECE369: Fundamentals of Computer Architecture

14ECE369

Formulation

Page 15: ECE369: Fundamentals of Computer Architecture

15ECE369

A Simpler Formula ?

Page 16: ECE369: Fundamentals of Computer Architecture

16ECE369

Problem: Ripple carry adder is slow!

• Is a 32-bit ALU as fast as a 1-bit ALU?• Is there more than one way to do addition?

• Can you see the ripple? How could you get rid of it?

c1 = a0b0 + a0c0 + b0c0c2 = a1b1 + a1c1 + b1c1 c2 = c3 = a2b2 + a2c2 + b2c2 c3 = c4 = a3b3 + a3c3 + b3c3 c4 =

• Not feasible! Why?

Page 17: ECE369: Fundamentals of Computer Architecture

17ECE369

Carry Bit

ininout bcacabc ++=

iiiiiii cbcabac ++=+1

0000001 cbcabac ++=

1111112 cbcabac ++=( ) ( )0000001000000111 cbcababcbcabaaba ++++++=

00100100100100100111 cbbcabbabcbacaabaaba ++++++=

2222223 cbcabac ++=

Page 18: ECE369: Fundamentals of Computer Architecture

18ECE369

Generate/Propagate

ai bi ci+1

0 00 11 01 1

0000001 cbcabac ++=

000001 )( cbabac ++=

1111112 cbcabac ++=

11111 )( cbaba ++=

[ ]000001111 )()( cbabababa ++++=

},{ iiii babacommon +=ai bi ci+1

0 00 11 01 1

iibagenerate = ii bapropagate +=

01

00

1

0

1

1

Page 19: ECE369: Fundamentals of Computer Architecture

19ECE369

Generate/Propagate (Ctd.)

iibagenerate =ii bapropagate +=

000001 )( cbabac ++=

000 cpg +=

111112 )( cbabac ++=

)( 00011 cpgpg ++=

001011 cppgpg ++=

iiii cpgc +=+ 1

Page 20: ECE369: Fundamentals of Computer Architecture

20ECE369

Carry-look-ahead adder

• Motivation: – If we didn't know the value of carry-in, what could we do?– When would we always generate a carry? gi = ai . bi– When would we propagate the carry? pi = ai + bi

• Did we get rid of the ripple?

c1 = g0 + p0c0c2 = g1 + p1c1 c2 = g1 + p1g0 + p1p0c0c3 = g2 + p2c2 c3 = g2 + p2g1 + p2p1g0 + p2p1p0c0c4 = g3 + p3c3 c4 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3p2p1p0c0

• Feasible! Why?c1 = a0b0 + a0c0 + b0c0c2 = a1b1 + a1c1 + b1c1 c2 = c3 = a2b2 + a2c2 + b2c2 c3 = c4 = a3b3 + a3c3 + b3c3 c4 =

a3 a2 a1 a0b3 b2 b1 b0

Page 21: ECE369: Fundamentals of Computer Architecture

21ECE369

A 4-bit carry look-ahead adder

• Generate g and p term for each bit

• Use g’s, p’s and carry in to generate all C’s

• Also use them to generate block G and P

• CLA principle can be used recursively

Page 22: ECE369: Fundamentals of Computer Architecture

22ECE369

16 Bit CLA

Page 23: ECE369: Fundamentals of Computer Architecture

23ECE369

Gate Delay for 16 bit Adder

iibagenerate =

ii bapropagate +=1

1+2

1+2+2

Page 24: ECE369: Fundamentals of Computer Architecture

24ECE369

Multiplication

• More complicated than addition– Accomplished via shifting and addition

• More time and more area

Page 25: ECE369: Fundamentals of Computer Architecture

25ECE369

Done

1. Test�Multiplier0

1a. Add multiplicand to product and�place the result in Product register

2. Shift the Multiplicand register left 1 bit

3. Shift the Multiplier register right 1 bit

32nd repetition?

Start

Multiplier0 = 0Multiplier0 = 1

No: < 32 repetitions

Yes: 32 repetitions

64-bit ALU

Control test

MultiplierShift right

ProductWrite

MultiplicandShift left

64 bits

64 bits

32 bits

Multiplication: Implementation

Page 26: ECE369: Fundamentals of Computer Architecture

26ECE369

Example

64-bit ALU

Control test

MultiplierShift right

ProductWrite

MultiplicandShift left

64 bits

64 bits

32 bits

Page 27: ECE369: Fundamentals of Computer Architecture

27ECE369

MultiplierShift right

Write

32 bits

64 bits

32 bits

Shift right

Multiplicand

32-bit ALU

Product Control test

Done

1. Test�Multiplier0

1a. Add multiplicand to the left half of�the product and place the result in�the left half of the Product register

2. Shift the Product register right 1 bit

3. Shift the Multiplier register right 1 bit

32nd repetition?

Start

Multiplier0 = 0Multiplier0 = 1

No: < 32 repetitions

Yes: 32 repetitions

Second version

64-bit ALU

Control test

MultiplierShift right

ProductWrite

MultiplicandShift left

64 bits

64 bits

32 bits

Page 28: ECE369: Fundamentals of Computer Architecture

28ECE369

Example

MultiplierShift right

Write

32 bits

64 bits

32 bits

Shift right

Multiplicand

32-bit ALU

Product Control test

Page 29: ECE369: Fundamentals of Computer Architecture

29ECE369

Control�testWrite

32 bits

64 bits

Shift rightProduct

Multiplicand

32-bit ALU

Done

1. Test�Product0

1a. Add multiplicand to the left half of�the product and place the result in�the left half of the Product register

2. Shift the Product register right 1 bit

32nd repetition?

Start

Product0 = 0Product0 = 1

No: < 32 repetitions

Yes: 32 repetitions

Final version

Page 30: ECE369: Fundamentals of Computer Architecture

30ECE369

Example

Control�testWrite

32 bits

64 bits

Shift rightProduct

Multiplicand

32-bit ALU

Page 31: ECE369: Fundamentals of Computer Architecture

31ECE369

Division

• Even more complicated– Can be accomplished via shifting and addition/subtraction

• More time and more area• Negative numbers: Even more difficult• There are better techniques, we won’t look at them

Page 32: ECE369: Fundamentals of Computer Architecture

32ECE369

Division: First version

Page 33: ECE369: Fundamentals of Computer Architecture

33ECE369

Example

Page 34: ECE369: Fundamentals of Computer Architecture

34ECE369

Division: Second version

Page 35: ECE369: Fundamentals of Computer Architecture

35ECE369

Improved Division

Page 36: ECE369: Fundamentals of Computer Architecture

36ECE369

Division (7÷2)

Page 37: ECE369: Fundamentals of Computer Architecture

37ECE369

Floating point (a brief look)

• We need a way to represent– Numbers with fractions, e.g., 3.1416– Very small numbers, e.g., 0.000000001– Very large numbers, e.g., 3.15576 x 109

• Representation:– Sign, exponent, fraction: (–1)sign x fraction x 2exponent

– More bits for fraction gives more accuracy– More bits for exponent increases range

• IEEE 754 floating point standard: – single precision: 8 bit exponent, 23 bit fraction– double precision: 11 bit exponent, 52 bit fraction

Page 38: ECE369: Fundamentals of Computer Architecture

38ECE369

IEEE 754 floating-point standard

• 1.f x 2e

• 1.s1s2s3s4…. snx2e

• Leading “1” bit of significand is implicit

• Exponent is “biased” to make sorting easier– All 0s is smallest exponent, all 1s is largest– Bias of 127 for single precision and 1023

for double precision

Page 39: ECE369: Fundamentals of Computer Architecture

39ECE369

Single Precision

–summary: (–1)sign x (1+significand) x 2(exponent – bias)

• Example:• 11/100 = 11/102= 0.11 = 1.1x10-1

–Decimal: -.75 = -3/4 = -3/22

–Binary: -.11 = -1.1 x 2-1

–IEEE single precision: 1 01111110 10000000000000000000000

–exponent-bias=-1 => exponent = 126 = 01111110

Page 40: ECE369: Fundamentals of Computer Architecture

40ECE369

Opposite Way

Sign Exponent Fraction

- 129 0x2-1+1x2-2=0.25

Page 41: ECE369: Fundamentals of Computer Architecture

41ECE369

Floating point addition

1.610x10-1 + 9.999x101

0.01610x101 + 9.999x101

10.015x101

1.0015x102

1.002x102

Page 42: ECE369: Fundamentals of Computer Architecture

42ECE369

Floating point addition

Still normalized?

4. Round the significand to the appropriatenumber of bits

YesOverflow orunderflow?

Start

No

Yes

Done

1. Compare the exponents of the two numbers.Shift the smaller number to the right until itsexponent would match the larger exponent

2. Add the significands

3. Normalize the sum, either shifting right andincrementing the exponent or shifting left

and decrementing the exponent

No Exception

Small ALU

Exponentdifference

Control

ExponentSign Fraction

Big ALU

ExponentSign Fraction

0 1 0 1 0 1

Shift right

0 1 0 1

Increment ordecrement

Shift left or right

Rounding hardware

ExponentSign Fraction

Page 43: ECE369: Fundamentals of Computer Architecture

43ECE369

Add 0.510 and -0.437510

Page 44: ECE369: Fundamentals of Computer Architecture

44ECE369

Multiplication

Page 45: ECE369: Fundamentals of Computer Architecture

45ECE369

Floating point multiply

• To multiply two numbers – Add the two exponent (remember access 127 notation)– Produce the result sign as exor of two signs– Multiply significand portions– Results will be 1x.xxxxx… or 01.xxxx….– In the first case shift result right and adjust exponent– Round off the result– This may require another normalization step

Page 46: ECE369: Fundamentals of Computer Architecture

46ECE369

Multiplication 0.510 and -0.437510

Page 47: ECE369: Fundamentals of Computer Architecture

47ECE369

Floating point divide

• To divide two numbers – Subtract divisor’s exponent from the dividend’s exponent

(remember access 127 notation)– Produce the result sign as exor of two signs– Divide dividend’s significand by divisor’s significand portions– Results will be 1.xxxxx… or 0.1xxxx….– In the second case shift result left and adjust exponent– Round off the result– This may require another normalization step

Page 48: ECE369: Fundamentals of Computer Architecture

48ECE369

Floating point complexities

• Operations are somewhat more complicated (see text)• In addition to overflow we can have “underflow”• Accuracy can be a big problem

– IEEE 754 keeps two extra bits, guard and round– Four rounding modes– Positive divided by zero yields “infinity”– Zero divide by zero yields “not a number”– Other complexities

• Implementing the standard can be tricky• Not using the standard can be even worse

– See text for description of 80x86 and Pentium bug!

Page 49: ECE369: Fundamentals of Computer Architecture

49ECE369

Chapter Three Summary

• Computer arithmetic is constrained by limited precision– Read pages 213-215

• Bit patterns have no inherent meaning but standards do exist– two’s complement– IEEE 754 floating point

• Operations are somewhat more complicated (see text)• In addition to overflow we can have “underflow”• Implementing the standard can be tricky• Not using the standard can be even worse (3.10)

– See text for description of 80x86 and Pentium bug!