Upload
rodney-craig
View
216
Download
0
Embed Size (px)
Citation preview
Decimal Floating-point MultiplicationDecimal Floating-point Multiplicationvia Carry-Save Additionvia Carry-Save Addition
Mark ErleMark ErleSystems & Technology Systems & Technology
GroupGroupInternational Business International Business
MachinesMachines
Brian Hickmann & Mike SchulteElectrical & Computer
EngineeringUniversity of Wisconsin at
Madison
22
OutlineOutline
• Introduction and motivationIntroduction and motivation
•Extensions to fixed-point Extensions to fixed-point designdesign
• Implementation highlightsImplementation highlights
•Verification and synthesis Verification and synthesis resultsresults
•SummarySummary
33
IntroductionIntroduction
• Preponderance of business data in decimal Preponderance of business data in decimal formform
• Inexact mapping between decimal and binaryInexact mapping between decimal and binary
• Decimal arithmetic used/required in banking, Decimal arithmetic used/required in banking, finance, insurance, accountingfinance, insurance, accounting
• Increasing support in arithmetic community, Increasing support in arithmetic community, (IEEE P754 in ballot review process)(IEEE P754 in ballot review process)
• Multiplication a key functionMultiplication a key function
44
MotivationMotivation
• What's involved in extending fixed-What's involved in extending fixed-point multiplication to support point multiplication to support floating-point?floating-point?
• What are the similarities and What are the similarities and differences with BFP multiplication?differences with BFP multiplication?
55
R ead an d D eco d e O p eran d s
G en erate T u p les
S elect T u p les
A ccu mu late Par t ial Pro d u ct s /D evelo p S t icky
L eft S h ift (R ed u n d an t ) I n termed iate Pro d u ct
A d d (R ed u n d an t ) S h ift ed I n termed iate Pro d u ct
R o u n d
S t icky C o u n ter C o n t ro l
I t erat io n C o n t ro l
R o u n d C o n t ro l
S elect S ig n ifican d
S h ift A mo u n t C o n t ro l
Exponent/Contro l Dataflow
E n co d e an d W r ite R es u lt
E x cep t io n C o n t ro l
F in aliz e E x p o n en t
= New function to support floating-po int= Modified function to support floating-po int
iterativeportion
T u p le S elect io n C o n t ro l
G en erate I n termed iate E x p .
66
Intermediate Exponent Intermediate Exponent CalculationCalculation
• Preferred exponent:Preferred exponent:PE = EPE = EAA + E + EBB - bias - bias
• Based on location of Based on location of the decimal point the decimal point (effective shift right):(effective shift right):IEIEIPIP = PE + p = PE + p
• After left shifting the After left shifting the intermediate intermediate product:product:IEIESIPSIP = IE = IEIPIP – SLA – SLA
Mu lt ip lican d(p d ig it s )
T u p le G en erat io nan d S elect io n
A ccu mu lato r
I n termed iate Pro d u ct R eg is ter(2 p d ig it s )
L o c a t io n o fd e c im a l p o in t
e ff e c t iv e ly as h ift re g is te r
77
Intermediate Product Intermediate Product ShiftingShifting
• Based on leading zero Based on leading zero counts of operandscounts of operands
• SLASLA may be off by may be off by one; need guard digitone; need guard digit
• SLA = min(LZSLA = min(LZAA + LZ + LZBB, p), p)
• Shift right when Shift right when IEIEIPIP < < EminEmin
I n termed iate Pro d u ct R eg is ter(2 p d ig it s )
L o c a t io n o fd e c im a l p o in t
L Z A + L Z B p
88
Sticky Bit GenerationSticky Bit Generation
• Logically, all bits beyond the round digit Logically, all bits beyond the round digit must be ORed after left shiftingmust be ORed after left shifting
• SC = SSC = SIPIP – p – 2 – p – 2, where , where 22 is for is for gg and and rr• Generate sticky bit on-the-fly, ORing Generate sticky bit on-the-fly, ORing
one digit at a time while decrementing one digit at a time while decrementing SCSC
• SC = min(0, p – (LZSC = min(0, p – (LZAA - LZ - LZBB))))– SSIPIP - p = ((p – LZ - p = ((p – LZAA) + (p – LZ) + (p – LZBB)) – p)) – p– Calculate two cycles prior to when neededCalculate two cycles prior to when needed
99
Rounding - SchemeRounding - Scheme
• No rounding overflow... simplifies schemeNo rounding overflow... simplifies scheme
• Unique compound adder neededUnique compound adder needed– SIPSIP may be in redundant form may be in redundant form– Require Require CCSIPSIP+0+0 and and CCSIPSIP+1; named C+1; named C+0+0 and C and C+1+1
• Possible corrective left shift (Possible corrective left shift (clscls) of one ) of one digitdigit– SSIPIP = S = SAA + S + SBB or S or SAA + S + SBB - 1 - 1– Adder Adder pp digits wide digits wide– Concatenate Concatenate gg or or g + 1g + 1
1010
Rounding – Scheme Rounding – Scheme ContinuedContinued• Three cases based on MSDs of Three cases based on MSDs of CC+0+0 and and
CC+1+1
– No leading zeros, no corrective left shiftNo leading zeros, no corrective left shift– Leading zeros, possible corrective left shiftLeading zeros, possible corrective left shift– Zero followed by all ninesZero followed by all nines
• Logically, select one among the followingLogically, select one among the following– CC+0+0 , C , C+1+1
– CC+0+0 « 1 || g « 1 || g, , CC+0+0 « 1 || g + 1 « 1 || g + 1– CC+1+1 « 1 || g « 1 || g, , CC+1+1 « 1 || g + 1 « 1 || g + 1– Zero, largest finite number, infinityZero, largest finite number, infinity
1111
Exception Detection & Exception Detection & HandlingHandling
• Invalid operationInvalid operation– sNaN (pass significand of sNaN)sNaN (pass significand of sNaN)– 0 x ∞ (produce qNaN with significand 0 x ∞ (produce qNaN with significand 00))
• Overflow (and Inexact)Overflow (and Inexact)– IEIEIPIP – SLA > Emax – SLA > Emax– Increase Increase SLASLA until all LZs removed until all LZs removed
• Underflow (and possibly Inexact)Underflow (and possibly Inexact)– IEIEIPIP – SLA < Emin – SLA < Emin– Decrease Decrease SLASLA until 0, then shift right until 0, then shift right
• InexactInexact
1212
I n t e rme d ia t e P ro d . Re g. ( M a s t e r/ S la v e w / Re s e t ) B/ I n t e rme d ia t e P ro d .Re g. ( M a s t e r/ S la v e )
5 b it s / d ig it ( d a t a in c a rry - s a v e fo rm) 4 b it s / d ig it
C o mp o u n d A d d e r ( p D ig it s )
S h if t e d No n - Re d u n d a n t P ro d .Re g. ( M a s t e r/ S la v e )
l g rl
g u a rdd ig it
( 4 b it s )s u m
( 4 b it s )
c a rry( 1 b it )
F in a l S h if t Le f t By O n e
S LA
F in a l P ro d u c t Re g is t e r ( M a s t e r/ S la v e )
O n - t h e - fl y s t ic kyg e n e ra t io n
S Cd ig it f ro m ro u n dd ig it p o s it io n
4 b it s / d ig it
Le g e n dS C = S t ic ky C o u n t e rS LA = S h if t Le f t A mo u n tl, g, r = LS D, g u a rd , a n d ro u n d d ig it p o s it io n ss b = s t ic ky b it
No t eLe s s s ig n ifi c a n t h a lf o f in t e rme d ia t e p ro d u c tre g is t e r a n d s h if t e d n o n - re d u n d a n t p ro d u c tre g is t e r c a n b e s h a re d
s b
s t ic kyb it
( 1 b it )
ro u n dd ig it
( 4 b it s )
g r s b
s b
Ro u n d
5 b it s / d ig it
2 : 1 M u lt ip le xo r
C + 0
C + 1
S h if t e d I n t e rme d ia t e P ro d .Re g. ( M a s t e r/ S la v e )
C P
Lo c a t io n o f d e c ima l p o in t
Le f t S h if t e r
l
S I P
I P
C + 1 C+ 0
C + 0
C+ 0
1313
Implementation HighlightsImplementation Highlights
• Leverage operands' LZCsLeverage operands' LZCs– SCSC, , SLASLA, and , and IEIESIPSIP
• Handle NaNs with minimal overheadHandle NaNs with minimal overhead– No dataflow modificationNo dataflow modification– Coerce multiplicand or multiplier to 1Coerce multiplicand or multiplier to 1
• Support gradual underflowSupport gradual underflow– No dataflow modificationNo dataflow modification– Simply extend number of iterationsSimply extend number of iterations
• Simple, control-based rounding schemeSimple, control-based rounding scheme
1414
RTL Model and VerificationRTL Model and Verification
• Verilog model for both fixed-point and Verilog model for both fixed-point and floating-point multiplier designsfloating-point multiplier designs
• All rounding modes, NaNs, exceptionsAll rounding modes, NaNs, exceptions
• Over 500,000 random & directed Over 500,000 random & directed testcasestestcases– IBM decNumber basedIBM decNumber based– IBM Haifa's FPgen (IEEE754R compliance)IBM Haifa's FPgen (IEEE754R compliance)– IBM dectestIBM dectest
• Validated pre- and post-synthesisValidated pre- and post-synthesis
1515
Synthesis ResultsSynthesis Results
• 64-bit (16 digit) operands, DPD encoded64-bit (16 digit) operands, DPD encoded
• LSI Logic's gflxp 0.11um CMOS, 55ps FO4LSI Logic's gflxp 0.11um CMOS, 55ps FO4
• Synopsys Design CompilerSynopsys Design Compiler
• ResultsResults– Fixed-pointFixed-point 119,653 um119,653 um22 14.72 FO4s14.72 FO4s– Floating-pointFloating-point 237,607 um237,607 um22 15.45 FO4s15.45 FO4s
• Critical pathCritical path– Fixed-pointFixed-point 4:2 compressor 4:2 compressor
(accumulator)(accumulator)– Floating-pointFloating-point 128-bit barrel shifer128-bit barrel shifer
1616
Applicability to Parallel Applicability to Parallel DesignsDesigns• IEIE and and IPIP shift generation shift generation
• Rounding schemeRounding scheme
• NaN handlingNaN handling
• Exception detection and handlingException detection and handling
• On-the-fly sticky bit generation... NOOn-the-fly sticky bit generation... NO
1717
Sequential vs. ParallelSequential vs. Parallel
• SequentialSequential– Less areaLess area– Potentially better cycle timePotentially better cycle time
• ParallelParallel– Less latencyLess latency– Higher throughputHigher throughput
1818
SummarySummary
• Extended fixed-point, serial multiplier Extended fixed-point, serial multiplier to support floating-pointto support floating-point
• Leveraged operands' LZCsLeveraged operands' LZCs
• Developed an efficient rounding schemeDeveloped an efficient rounding scheme
• Verified RTL and gate-level modelsVerified RTL and gate-level models
• Presented area and delay numbers for Presented area and delay numbers for fixed- and floating-point designsfixed- and floating-point designs
• Discussed applicability to parallel designs Discussed applicability to parallel designs
1919
Et voilà!Et voilà!
Vive le système décimale!Vive le système décimale!
2020
Backup SlidesBackup Slides
2121
No Rounding OverflowNo Rounding Overflow
• If If SSIPIP = 2p – 1 = 2p – 1– MSD == 0MSD == 0– Increment will not cause rounding overflowIncrement will not cause rounding overflow
• If If SSIPIP = 2p = 2p– TThen we must have string of p 9s hen we must have string of p 9s – p 9s is greater than maximum productp 9s is greater than maximum product– No rounding overflow possibleNo rounding overflow possible
• Simplifies rounding schemeSimplifies rounding scheme
2222
Decimal Storage FormatDecimal Storage Format