View
41
Download
1
Category
Preview:
DESCRIPTION
May 14, 2004 , TU Tallinn, Estonia. Reconfigurable HPC part 3 Architectural Resources. Reiner Hartenstein TU Kaiserslautern. terms:. DPU: datpath unit DPA: data path array rDPU: reconfigurable DPU rDPA: reconfigurable DPA. Converging Design Flows. - PowerPoint PPT Presentation
Citation preview
Reconfigurable HPC
Reconfigurable HPC
part 3Architectural
Resources
Reiner Hartenstein
TU Kaiserslautern
May 14, 2004 , TU Tallinn, Estonia
© 2004, reiner@hartenstein.de http://hartenstein.de2
TU Kaiserslautern
Converging Design Flows
this synthesis method is a generalization of
systolic array synthesis:super systolic synthesis
and DPA [Broderson,
2000]: terms:
DPU: datpath unitDPA: data path arrayrDPU: reconfigurable DPUrDPA: reconfigurable DPA
the same synthesis method may be used for mapping an algorithm
onto both:rDPA [Kress, 1995],
© 2004, reiner@hartenstein.de http://hartenstein.de3
TU Kaiserslautern>> Time to space migration
<<
• Time to space migration
• Flowware languages
• Data Sequencers
• Sequencing through 2-D memory
• MoM architecture
• Acceleration mechanisms
http://www.uni-kl.de
© 2004, reiner@hartenstein.de http://hartenstein.de4
TU Kaiserslautern
Problems in time to space migration of algorithms
Time to space migration of algorithms
Some have moderate interconnect requirements
Many DSP algorithms require just a pipeline
Some algorithms require excessive interconnect
Example: the Viterbi algorithm
A comprehensive taxonomy of algorithms is missing
© 2004, reiner@hartenstein.de http://hartenstein.de5
TU Kaiserslautern
IC interconnect: metal layers
Intel
Foundries offer up to 9 metal layers
and up to 3 poly layers
Reconfigurable interconnect fabric layouted over the rDPU cell
© 2004, reiner@hartenstein.de http://hartenstein.de6
TU Kaiserslautern
KressArray Family generic Fabrics: a few examples
Examples of 2nd Level Interconnect:layouted overrDPU cell - no separate routing areas !
+
rout-through and function
rout-throug
h only more NNports:
rich Rout Resources
Select Function
Repertory
select Nearest Neighbour (NN) Interconnect: an example
16 32 8 24
4
2 rDPU
Select mode, number, width of NNports
http://kressarray.de
© 2004, reiner@hartenstein.de http://hartenstein.de7
TU KaiserslauternKressArray DPSS
ApplicationSet
DPSS
published at ASP-DAC 1995
ArchitectureEditor
MappingEditor
statist.Data
DelayEstim.
Analyzer
Architecture
Estimator
interm.form 2
expr.tree
ALE-XCompiler
PowerEstimator
PowerData
VHDLVerilog
HDLGeneratorSimulator
User
ALEXCode
Improvement Proposal Generator
Suggestion
SelectionUserInterface
interm.form 3
Mapper
DesignRules
DatapathGeneratorGenerator
KressrDPU
Layout
data stream Schedule
Scheduler
KressArrayXplorer (Platform Design Space Explorer)
Xplorer
InferenceEngine (FOX)
Sug-gest-ion
KressArrayfamily
parameters
Compiler
Mapper
Scheduler
© 2004, reiner@hartenstein.de http://hartenstein.de8
TU Kaiserslautern
Xplorer GUI
© 2004, reiner@hartenstein.de http://hartenstein.de9
TU Kaiserslautern
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
array size: 10 x 16 = 160 rDPUs
http://kressarray.de
SNN filter KressArray Mapping Example
rout thru only
not usedbackbus connect
© 2004, reiner@hartenstein.de http://hartenstein.de10
TU Kaiserslautern
route-thru-only rDPU
3 vert. NNports, 32 bit
http://kressarray.de
Xplorer Plot: SNN Filter Example
+[13]
2 hor. NNports, 32 bit
operator
result
operand
operand
route thru
backbus connect
© 2004, reiner@hartenstein.de http://hartenstein.de11
TU Kaiserslautern
Communication resource editor panel of the
Xplorer user interface
© 2004, reiner@hartenstein.de http://hartenstein.de12
TU Kaiserslautern
Elements of the Xplorer mapping editor:a) Routing editor panel
© 2004, reiner@hartenstein.de http://hartenstein.de13
TU Kaiserslautern
Elements of the Xplorer mapping editor:b) Input port editor panel
© 2004, reiner@hartenstein.de http://hartenstein.de14
TU Kaiserslautern
Xplorer: Improvement Proposal Generator
© 2004, reiner@hartenstein.de http://hartenstein.de15
TU Kaiserslautern
Xplorer: conditional swap
operator
© 2004, reiner@hartenstein.de http://hartenstein.de16
TU Kaiserslautern
Xplorer: Macro
cells
© 2004, reiner@hartenstein.de http://hartenstein.de17
TU Kaiserslautern
FPGA-Style Mapping for coarse grain reconfigurable arrays
mapping Kress DPSS CHESS RaPiD Colt
placement simulated annealinggenetic
algorithm
routing
simulatedannealing
Pathfindergreedy
algorithm
Compiler
Mapper
Schedulerspecifies and
assembles thedata streams
from / to array
DPSS
KressArray DPSS(Datapath Synthesis System)
© 2004, reiner@hartenstein.de http://hartenstein.de18
TU KaiserslauternUlrich Nageldinger
DissertationUlrich Nageldinger: • ... on mapping applications onto KessArrays• ... simultaneous routing and placement by
simulated annealing• Supporting a huge family of KressArrays• fuzzy logic improvement proposal generator• profiling• design space exploration
infineon technologies, Munich
http://hartenstein.de/Ph-D-Theses.html
© 2004, reiner@hartenstein.de http://hartenstein.de19
TU Kaiserslautern>> Flowware languages <<
• Time to space migration
• Flowware languages
• Data Sequencers
• Sequencing through 2-D memory
• MoM architecture
• Acceleration mechanisms
http://www.uni-kl.de
© 2004, reiner@hartenstein.de http://hartenstein.de20
TU Kaiserslautern
Similar Programming Language Paradigms
language category Computer Languages Xputer Languages
both deterministic procedural sequencing: traceable, checkpointable
sequencingdriven by:
read next instruction, goto (instruction addr.), jump (to instruction addr.), instruction loop, instruction loop nesting no parallel loops, instruction loop escapes, instruction stream branching
read next data object, goto (data addr.), jump (to data addr.), data loop, data loop nesting, parallel data loops, data loop escapes, data stream branching
very easy to learn
© 2004, reiner@hartenstein.de http://hartenstein.de21
TU Kaiserslautern
JPEG zigzag scan pattern
x
y
*> Declarations
HalfZigZag isEastScanloop 3 times SouthWestScanSouthScanNorthEastScanEastScanendloopend HalfZigZag;
goto PixMap[1,1]
HalfZigZag;SouthWestScanuturn (HalfZigZag)
HalfZigZag
data counterdata counter
data counterdata counter
HalfZigZag
EastScan is step by [1,0]end EastScan;
SouthWestScan isloop 8 times until [1,*]step by [-1,1]endloopend SouthWestScan;
SouthScan isstep by [0,1]endSouthScan; NorthEastScan isloop 8 times until [*,1]step by [1,-1]endloopend NorthEastScan;
Flowware language example (MoPL)The same language
principles
© 2004, reiner@hartenstein.de http://hartenstein.de22
TU Kaiserslautern
• The MoPL-3 Grammar ... of ...
• the Map-oriented Programming Language version 3 (MoPL-3), a data-procedural programming language
• to specify functions and operators to be mapped onto a DataPath Array (DPA) or other pipe network (hardwired as well as reconfigurable)
• and to procedurally program data streams associated with these functions or operators
MoPL-3 Grammar
© 2004, reiner@hartenstein.de http://hartenstein.de23
TU Kaiserslautern
MoPL grammar 1 (14): 1. Program Def.
2. Boundary Decl‘s
Identarray Decl-Size
;Data Typeof
Array Declaration
Identboundary Decl-Size ;
2. Boundary DeclarationsBoundary Declaration
SW Declaration
rALU Set-up
Scan Pattern Decl.
Boundary Declaration
Declaration Part
3
4
5
rALU = rDPU
15
16
19
1. Program Definition
Declaration Part Scan Statement PartMoPL Subroutine
© 2004, reiner@hartenstein.de http://hartenstein.de24
TU Kaiserslautern
MoPL grammar 2 (14): 3. Scan Window Decl‘s
Window Names
Point
Window Size
handle Data Typeof
Window Spec
Name-ListWindow Names
SW Group Name
Ident
3. Scan Window Declarations
Window Size Decl-Size
27
SW = Scan Window
window SW Group Name Window Spec
is
are
;Compound Window Declaration
© 2004, reiner@hartenstein.de http://hartenstein.de25
TU Kaiserslautern
MoPL grammar 3 (14): 4. rALU Set-up Decl‘s
4. rALU Set-up Declarations
Do Structure
Sub Structure
While Structure
Top Structure
Top Structure ;
Structural Part
do ConditionwhileS ub S truc ture ;Do Structure
rALU Name
Structural Part
SW Group Name
rALU subnet
is
of
; ;resident
rALU Config
IdentrALU Name
15
While Structure Conditionwhile Sub Structure ;
17
© 2004, reiner@hartenstein.de http://hartenstein.de26
TU Kaiserslautern
MoPL grammar 4
set localBranchFlag ;Set Structure
Sub StructureSub Structure List
FourBitVectorLocal Branch Flag
if Condition
then Sub Structure
else Sub Structure ;
If Structure
Sub Structure List
If Structure
Assignment
begin
end
Set Structure
;
Sub Structure
( Expression )Condition 25
© 2004, reiner@hartenstein.de http://hartenstein.de27
TU Kaiserslautern
MoPL grammar 5
(for missing production rules see Ph. D. thesis by Jürgen Becker)
activate
passivate
remove
rALU Subnet Name ;
rALU Activation
rALU Subnet NameIdent
http://hartenstein.de/Ph-D-Theses.html
© 2004, reiner@hartenstein.de http://hartenstein.de28
TU Kaiserslautern
MoPL grammar 6 (14): 5. Scan Pattern Decl‘s
Pattern NameIdent
Simple Pattern DeclPattern Name is Scan Action
rALUsubnet Flagdependent on
rALUsubnet FlaglocalBranchFlag
5. Scan Pattern Declarations
scanPattern Simple_Pattern_Decl ;
Compound Scan Pattern Decl
FourBitVectorLocal Branch Flag
22
© 2004, reiner@hartenstein.de http://hartenstein.de29
TU Kaiserslautern
MoPL grammar 7: 6. Scan Statement Decl‘s
begin end;Scan Statem ent B lockScan Statement Part
6. Scan Statement Declarations
Scan_Pattern_NameIdent
Scan_Window_NameIdent
Scan Statement Block with doSW Group Name
begin end;Sc an Statem ent
15
16
© 2004, reiner@hartenstein.de http://hartenstein.de30
TU Kaiserslautern
MoPL grammar 8 (14)
Ident
Array Name
Scan Statement move toScan Window Name
PointArray Name
Scan Pattern Call
rALU Activation
;
Scan Pattern Call Scan Pattern Name,
[ ]Scan Window Name
,parbegin parendScan Pattern
Scan Pattern Call( )
;
18
© 2004, reiner@hartenstein.de http://hartenstein.de31
TU Kaiserslautern
MoPL grammar 9 (14): 7. Scan Actions
7. Scan Action Declarations
Scan Pattern Sequence
Simple Scan
Library Scan
Pattern Spec
Scan Pattern Sequence
Scan Ident;begin Scan Action end
Scan N ame
Scan N ame Escape C lauseuntil
;begin Scan Action end
Es cape C laus ewhile Sc an Action
Scan Action
23
24
24 24
24
© 2004, reiner@hartenstein.de http://hartenstein.de32
TU Kaiserslautern
MoPL grammar 10 (14)
rotlrotrrotumirxmiryhalfrotlhalfrotrreverse
Stretching
Shearing
TransformationShortest Step
nne
ese
ssw
wnw
t.b.d.Stretching
1 step
],steps
[
Number
Number
Number
Shor tes tStep
Simple Scan
Shearing t.b.d.
© 2004, reiner@hartenstein.de http://hartenstein.de33
TU Kaiserslautern
MoPL grammar 11 (14)
Rel Op Number
Condition Clause
Transformation
Ident
)Scan Name(
Scan Name
IdentScan Ident
IdentLib Scan Name
SizeXY , NumberNumber
StepWidthXY
, NumberNumber
Escape ClauseCondition Clause
@ [
]
,
Condition Clause
external LibScanName
( SizeXY StepWidthXY; )
Library Scan
© 2004, reiner@hartenstein.de http://hartenstein.de34
TU Kaiserslautern
MoPL grammar 12 (14): 8. Expressions
8. Expression DeclarationsAssignment Expression ;Ident =
Sign +
-
Expression Simple ExpressionSimple ExpressionRel Op
Simple Expression
Term
+
-
or
xor
Factor
*
/
mod
and
Term Rel Op
<
<=
>
>=
==
<>
Factor( Expression )
Sign Factor
Factornot
Unsigned Real
SW Variable
Number
SW Variable
Ident Point
© 2004, reiner@hartenstein.de http://hartenstein.de35
TU Kaiserslautern
MoPL grammar 13: 9. Lexical Declarations
0 1 0 0 11 0 1
FourBitVector
Point ,[ ]NumberNumber
9. Lexical DeclarationsIdent
LetterLetter
Digit
Underscore
Digit
0 1 2 4 53 6 7 8 9
Underscore _
Scale FactorE Number
Signe
Number Digit
Unsigned Real
.NumberScale Factor
Number
A Z a z... ...
Letter
© 2004, reiner@hartenstein.de http://hartenstein.de36
TU Kaiserslautern
MoPL grammar 14 (14): 10. Common Production Rules
10. Common Production Rules
Ident
,
Name-List
Number Number:Range charunsigned
shortunsigned
intunsigned
longunsigned
float
Data Type
[ , ]Range RangeDecl-Size
© 2004, reiner@hartenstein.de http://hartenstein.de37
TU Kaiserslautern>> Data Sequencers <<
• Time to space migration
• Flowware languages
• Data Sequencers
• Sequencing through 2-D memory
• MoM architecture
• Acceleration mechanisms
http://www.uni-kl.de
© 2004, reiner@hartenstein.de http://hartenstein.de38
TU Kaiserslautern
application-specific distributed memory*
• Application-specific memory: rapidly growing markets:– IP cores– Module generators– EDA environments
• Optimization of memory bandwidth for application-specific distributed memory
• Power and area optimization as a further benefit
• Key issues of address generators will be discussed
*) see books by Francky Catthoor et al.
© 2004, reiner@hartenstein.de http://hartenstein.de39
TU Kaiserslautern
Significance of Address Generators
• Address generators have the potential to reduce computation time significantly.
• In a grid-based design rule check a speed-up of more than 2000 has been achieved, compared to a VAX-11/750
• Dedicated address generators contributed a factor of 10 - avoiding memory cycles for address computation overhead
© 2004, reiner@hartenstein.de http://hartenstein.de40
TU Kaiserslautern
Smart Address Generators
1983 The Structured Memory Access (SMA) Machine
1984 The GAG (generic address generator)
1989 Application-specific Address Generator (ASAG)
1990 The slider method: GAG of the MoM-2 machine
1991 The AGU
1994 The GAG of the MoM-3 machine
1997 The Texas Instruments TMS320C54x DSP
1997 Intersil HSP45240 Address Sequencer
1999 Adopt (IMEC)
© 2004, reiner@hartenstein.de http://hartenstein.de41
TU Kaiserslautern
Adopt (from IMEC)
•cMMU synthesis environment:
•application-specific ACUs for array index reference
•ACU as a counter modified by multi-level logic filter
•ACU with ASUs from a Cathedral-3 library
•distributed ACU alleviates interconnect overhead (delay, power, area)
•nested loop minimization by algebraic transformations
•AE splitting/clustering
•AE multiplexing to obtain interleaved ASs
•other features
•customized MMU (cMMU) • address expression (AE)
•Address Sequence (AS)•Address Calculation Unit (ACU) • Application-Specific Unit (ASU)
For more details on Adopt see paper in proceedings CD-ROM
© 2004, reiner@hartenstein.de http://hartenstein.de42
TU Kaiserslautern
Distributed Memory
SA: scrambling and descrambling the data ?
Just in time: a new research area:
Application-specific distributed memory:
e. g. book by F. Catthoor et al. ...
Data address generators - 20 years research:
© 2004, reiner@hartenstein.de http://hartenstein.de43
TU Kaiserslautern
>> Sequencing through 2-D memory <<
• Time to space migration
• Flowware languages
• Data Sequencers
• Sequencing through 2-D memory
• MoM architecture
• Acceleration mechanisms
http://www.uni-kl.de
© 2004, reiner@hartenstein.de http://hartenstein.de44
TU Kaiserslautern
MoM anti machine
Speedup by MoM
datacounter
memory
bank
asM
asM
asM
asM
asM
asM
...... asM
A d
istr
ibu
ted
mem
ory
(r)DPUsmart
memoryinterfac
e
MoM architecture:2-D memory space,adj. scan window
example: 4x4
scan window
grid-based design rule check example
speed-up: >1000complex boolean expressions in 1 clock cycle
address computation overhead: 94 %
© 2004, reiner@hartenstein.de http://hartenstein.de45
TU Kaiserslautern
Xputer Lab at Kaiserslautern: MoM I and II
© 2004, reiner@hartenstein.de http://hartenstein.de46
TU Kaiserslautern
Antimachine: MoM architecture
x
y
handle positions
scan window
scan pattern (high level sequencing)
example
intra scan window accesses(low level sequencing)
Handle Position Generator
Scan Window Generator
handleposition
bank 0 1 • • • n
y-GAG x-GAG
memory accesses
© 2004, reiner@hartenstein.de http://hartenstein.de47
TU Kaiserslautern
Vary-size scan windows
Size adjustable at run time
square or rectangular shape
location‘s individual access mode: R, W, R/W, no-op
by no-op placements any wild window shape
avoid multiple read/multiple write for overlapping successive scan window positions
© 2004, reiner@hartenstein.de http://hartenstein.de48
TU Kaiserslautern
2-D Generic Data Sequence Examples
a) b)
c)
d) e) f) g)
© 2004, reiner@hartenstein.de http://hartenstein.de49
TU Kaiserslautern
GAG Slider Model
987654321
987654321
123
x
y
x-scan line number
y-sc
anlin
enu
mbe
r
scan line number:
1 2 3
a)
b)
c)
scan pattern example for illustration of the slider model.
sliders
sliders
b) x addressc) y address
a) total address
123
LimitSlider
BaseSlider
GAG
AddressStepper
B0AL0
A
GenericAddressGenerator
© 2004, reiner@hartenstein.de http://hartenstein.de50
TU Kaiserslautern
GAG =Address
Generatorc
Generic GAU generic address unit Scheme
BaseSlider
B0
LimitSlider
L0
0B
[
AddressStepper
A
A
A
|| ||
L
]
limit
all 3 are copiesof the same BSU
stepper circuitGAU
published
in 1990
© 2004, reiner@hartenstein.de http://hartenstein.de51
TU Kaiserslautern GAG: Address Stepper
GAG =
AddressGenerator
Generic
+ / –
Escape
ClauseEnd
Detect
StepCounter
=o
L A A
inittag
AAddress
endExec
maxStepCount
0BLimit Base stepVector
[] | |
A LB0
[ ]|| ||limit
GAG: Address Stepper
© 2004, reiner@hartenstein.de http://hartenstein.de52
TU Kaiserslautern
Generic Sequence Examples
LimitSlider
BaseSlider
GAU
AddressStepper
B0AL0
A
published
in 1990
a) b)
c)
d) e) f) g)
video scan
-90º rotated video scan
sheared video scan
non-rectangular video scan
zigzag video scan
spiral scan
feed-back-driven scans
atomic scan linear scan
-45º rotated (mirx (v scan))
perfectshuffle
until
© 2004, reiner@hartenstein.de http://hartenstein.de53
TU Kaiserslautern
GAG Slider Model
LimitStepper
BaseStepper
AddressStepper
B0AL0
A
LimitStepper
BaseStepper
AddressStepper
B0AL0
A
sliders
B0B
[
0 L
]0L0
B0B
[
0 A
A
L
]0L0
GAGGenericAddress
Generator
floor ceiling
© 2004, reiner@hartenstein.de http://hartenstein.de54
TU Kaiserslautern
ceiling
C
address
GAG Slider Operation Demo Example
yx
LB
L0B0AF
floor
LB
floor
slid
er
ceiling slider
© 2004, reiner@hartenstein.de http://hartenstein.de55
TU Kaiserslautern
GAG Complex Sequencer Implementation
LimitSlider
BaseSlider
GAG
AddressStepper
B0AL0
A
all `been published
in 1990
LimitSlider
BaseSlider
GAG
AddressStepper
B0AL0
A
LimitSlider
BaseSlider
GAU
AddressStepper
B0AL0
A
GAGGAG
GAUGeneric Addressing Unit
SDS
GAU
VLIWstack
© 2004, reiner@hartenstein.de http://hartenstein.de56
TU Kaiserslautern
XMDS Scan Pattern Editor GUI
© 2004, reiner@hartenstein.de http://hartenstein.de57
TU Kaiserslautern>> Acceleration mechanisms
<<
• Time to space migration
• Flowware languages
• Data Sequencers
• Sequencing through 2-D memory
• MoM architecture
• Acceleration mechanisms
http://www.uni-kl.de
© 2004, reiner@hartenstein.de http://hartenstein.de58
TU Kaiserslautern
Linear Filter Application
b)
r
r r r
r
r/w r r
r
rr r
w / r r r
r
r r r
r
w/r r r
r
r r r Bank a
Bank a
Bank b
w r
r
r
scan step
© 2004, reiner@hartenstein.de http://hartenstein.de59
TU Kaiserslautern
Scanline unrolling
r r
r/w r r
r
r r r
r/w r r
r/w r r
r r r
© 2004, reiner@hartenstein.de http://hartenstein.de60
TU Kaiserslautern
90o Rotation of Scan Pattern
r r
rr
r
r
r
r
r
r
Bank a
Bank a
Bank b
Bank b
w wwr rr rr
r rr rrw ww
w w w
r
w
r
rr
r
r
r
r
w
r
r
w
Bank a
Bank a
Bank b
Bank b
scanwindowoverlaparea
r r/wr r/w r/w
r
r
r/w
r
rr
r
r
r
r/w
r
r
r/w
r
r
© 2004, reiner@hartenstein.de http://hartenstein.de61
TU Kaiserslautern
Linear Filter Application
after inner scan line loop unrolling
final design
after scan line
unrolling
hardw. level access optim.
initial design
Parallelized Merged Buffer Linear Filter Applicationwith example image of x=22 by y=11 pixel
© 2004, reiner@hartenstein.de http://hartenstein.de62
TU Kaiserslautern
r r
r/w r r
r
r r r
r/w r r
r/w r r
r r r
after inner scan line loop unrolling
final design
after scan line
unrolling
hardw. level access optim.
initial design
rr
w/r r r
r
r r r Bank a
Bank a
Bank b
Storage scheme optimization: scanline unrolling
x
y
handle positions
scan window
scan pattern (high level sequencing)
example
intra scan window accesses(low level sequencing)
MoM anti machine architecture
© 2004, reiner@hartenstein.de http://hartenstein.de63
TU Kaiserslautern
MoM anti machinean Xputer architecture
Speedup by MoM
datacounter
memory
bank
asM
asM
asM
asM
asM
asM
...... asM
A d
istr
ibu
ted
mem
ory
rDPUsmart
memoryinterface
Multiple scan windows
example: 4x4
scan window
s
.....
© 2004, reiner@hartenstein.de http://hartenstein.de64
TU Kaiserslautern
16 point CGFFT: mapped onto 2-D memory space
© 2004, reiner@hartenstein.de http://hartenstein.de65
TU Kaiserslautern
ou
tpu
t
tem
p
tem
p
tem
p
coeff
.
coeff
.
coeff
.
CGFFT: Nested and Parallel Scan Pattern
inp
ut
coeff
.
ini
ini+1
coeff.empty
MAC
© 2004, reiner@hartenstein.de http://hartenstein.de66
TU Kaiserslautern
CGFFT: Parallel Scan Pattern Animation
ini
ini+1
coeff.empty
outk
MAC
outj 32 steps
© 2004, reiner@hartenstein.de http://hartenstein.de67
TU Kaiserslautern
CGFFT: Parallel Scan Pattern Animation
MAC
outj
outj+1
outk
outk+1
ini
ini+1
coeff.empty
Ini+2
ini+3
coeff.empty
MAC
4 MAC unitsin parallel
8 MAC unitsin parallel
16 steps8 steps4 steps
© 2004, reiner@hartenstein.de http://hartenstein.de68
TU Kaiserslautern CGFFT: Nested and Parallel Scan Pattern
scanouter loop
patternHLScan is 3 steps [2, 0]
SP1 is 7 steps [0, 2]
SP23 is 7 steps [0, 1]
inner loopcompoundscanpatterns
3 in parallel
goto
© 2004, reiner@hartenstein.de http://hartenstein.de69
TU Kaiserslautern>> Acceleration mechanisms
<<
• Time to space migration
• Flowware languages
• Data Sequencers
• Sequencing through 2-D memory
• MoM architecture
• Acceleration mechanisms
http://www.uni-kl.de
© 2004, reiner@hartenstein.de http://hartenstein.de70
TU Kaiserslautern
Speed-up Enablers
Hier eine Liste
DRC 4 orders of magnitude
Address computation overheadTranslate into super-systolic rather than into instruction streams
Determine interconnect fabrics by compilation, but not before fabrication
Determine memory architecture by compilation, but not before fabrication
© 2004, reiner@hartenstein.de http://hartenstein.de71
TU Kaiserslautern
Acceleration Mechanisms
•parallelism by multi bank memory architecture•auxiliary hardware for address calculation •address calculation before run time
•avoiding multiple accesses to the same data.•avoiding memory cycles for address computation•improve parallelism by storage scheme transformations•improve parallelism by memory architecture transformations
•alleviate interconnect overhead (delay, power and area)
© 2004, reiner@hartenstein.de http://hartenstein.de72
TU Kaiserslautern
END
Recommended