Upload
doantuyen
View
230
Download
0
Embed Size (px)
Citation preview
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
1
Seminar Nº 03301, Dynamically
Reconfigurable Architectures
A Mead-&-Conway-like Break-through is overdue
Reiner Hartenstein
Kaiserslautern University of Technology
Dagstuhl, July 20-25, 2003
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
2
Ubiquitous embedded systems
Embedded System Engineering (ESE) requires:
• Hardware (HW) / (E)Software (ESW) co-design
• Configware (CW) / ESW co-design
• HW / CW / ESW co-design
ESE becomes the main focus in system design:
ESW becomes main vehicle to product differentiation
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
3
Reconfigurable Computing: a second programming domain
Migration of programming to the structural domain
The opportunity to introduce the structural domain to programmers ...
The structural domain has become RAM-based
... to bridge the gap by clever abstraction mechanisms using a simple new machine paradigm
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
4
>> outline (1) <<
• Embedded System Design Crisis • Supercomputing Crisis • µP Crisis • CS crisis • CS for Embedded Systems? • New Machine Paradigm • final remarks
http://www.uni-kl.de
more crises
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
5
Embedded System Design Crisis
year
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
6
Mask & NRE cost [ST microelectronics]
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
2
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
7
Foundries: Adoption Rate By Process [Nick Tredennick]
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
8
„EDA industry shifts into CS mentality“ [Wojciech Maly]
• patches instead of engineering
• innovation stalled many years ago
• netlist-based: do not care about efficiency, ...
• ... do not care about transistor density
• 85% users hate their tools
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
9
Where are we heading ?
1
2
0 10 12 18 months
factor
*) Department of Trade and Industry, London
90% by 2010
10 times more programmers
will write embedded applications
than computer software by 2010
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
10
Panels on the 2nd Design Crisis: proposing a solution
Lacking Sense of Direction
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
11
>> outline (2) <<
• Embedded System Design Crisis • Supercomputing Crisis • µP Crisis • CS crisis • CS for Embedded Systems? • New Machine Paradigm • final remarks
http://www.uni-kl.de
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
12
Dead Supercomputer Society
•ACRI •Alliant •American Supercomputer
•Ametek •Applied Dynamics •Astronautics •BBN •CDC •Convex •Cray Computer •Cray Research •Culler-Harris •Culler Scientific •Cydrome •Dana/Ardent/ Stellar/Stardent
•DAPP •Denelcor •Elexsi •ETA Systems •Evans and Sutherland •Computer •Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace MPP •Gould NPL •Guiltech •ICL •Intel Scientific Computers •International Parallel Machines
•Kendall Square Research •Key Computer Laboratories
[Gordon Bell, keynote at ISCA 2000]
•MasPar •Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer •Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
3
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
13
microprocessor architectures (1)
©Arndt Bode LRR-TUM 13
Entwicklung der Mikroprozessor Architekturen (1)
Bis 1995: Einschränkung - , seit 1995 Erhöhung der Typen- und Architekturvielfalt
Transistorzahl (Moore‘s Gesetz): Abwägung Rechenleistung-Leistungsaufnahme-Kosten-
Kompatibilität
MPR Analysts‘ Choice Awards Kategorien:
- PC Processors: Intel P4 (HyperThreading), AMD Athlon (x 86-64,
Hyper Transport), Transmeta (Binary Compilation, VLIW),...
- Server Processors: Intel Xeon MP und Itanium 2 (EPIC), AMD Opteron
(x86-64), HP Alpha EV-7, Fujitsu Sparc 64 V (out-of-order superscalar)
- High-Performance Embedded Processors: Broadcom BCM 1250, IBM 440
GX, Intrinsity FastMIPS, Motorola MPC 7455, NEC VR7701, PMC Sierra
RM9000x2
- Low-Power Embedded Processors: AMD Au1100, Intel PXA 250, NEC VR
4131, DragonBall MX1, NeoMagic MiMagic5 (1mW pro MHz)
- Extreme Processors: CmU PipeRench, Intrinsity FastMath, Micron Yukon,
NEC DRP, PACT XPP, Sandbridge Sand Blaster (bis 512 ALUs)
- Embedded IP Processor Cores: ARCtangent-A5, ARM 1026 EJ-S/1136JF-S,
Improv Crescendo, MIPS M4K, Tensilica Xtensa V
- Graphics Processors: 3Dlabs Wildcat VP900, ATI Radeon 9700, Nvidia GeForce FX
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
14
Some Supercomputing people now looking at us
Reconfigurable Computing
PetaFlop/s (1015
) Initiative
Steroids for the aging microprocessor:
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
15
>> outline (3) <<
• Embedded System Design Crisis • Supercomputing Crisis • µP Crisis • CS crisis • CS for Embedded Systems? • New Machine Paradigm • final remarks
http://www.uni-kl.de
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
16
CS: young ? dynamic?
.. but the von Neumann Paradigm is still the dominant doctrine ...
Microelectronics is ignored (except falling cost
of computational effort)
... still pushing he basic models from the times of mainframe dinosaurs
after >10 technology generations ...
• 1th 4004 • 2nd 8008 • 3rd 8086 • 4th 80286 • 5th 80386 • 6th 80486 • 7th P5 (Pentium) • 8th P6 (Pentium Pro / Pentium II) • 9th Pentium III • 10th .... • 11th
• .......
... the vN Microprocessor is a methusela, the steam engine of the silicon age.
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
17
stolen from Bob Colwell
processor/memory commmunication bottleneck
vN bottleneck vN: unbalanced
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
18
MPU designs more complex
greatly complicates the verification process
chip-level multiprocessing + simultaneous multithreading
many bugs relate to concurrency issues
new kinds of concurrency are becoming important
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
4
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
19
„Pollack‘s Law“ (simplified) [intel]
growth factor
µm
0.1
performance
area efficiency
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
20
MPU performance stalled
Moore’s law will stall soon for MPUs
relative computation time needed doubles every 2 years
had been compensated by Moore’s law
Bill Gates’ law:
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
21
>> outline (4) <<
• Embedded System Design Crisis • Supercomputing Crisis • µP Crisis • CS crisis • CS for Embedded Systems? • New Machine Paradigm • final remarks
http://www.uni-kl.de
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
22
Crusty Computing Sciences
[David Padua, John Hennessy]
shrinking supercomputing conferences
more and more efforts yield only marginal improvements
dataflow machines dead
98.5% vN-only
this monopoly is the problem
areas fade away
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
23
blinders:
„we are o.k. !“ (no new direction)
Lacking Sense of Direction ?
for ignoring the impact of RC © 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
24
Stealthy CS Crisis
progress in CS stalled by qualification problems in industry and academia
communication barriers between disciplines
severe software quality problems
often hardware people needed to solve CS problems
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
5
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
25
What‘s the problem ?
.... by signals rippling through a network of transistors.
The typical programmer has problems to understand function evaluation without machine mechanisms....
Traditional CS: programming is (control-)procedural, instruction-stream-based – sources: software
accelerators accelerators µprocessor µprocessor
It‘s the gap between procedural and structural mind set
Crossing the Hardware / Software Chasm [Mike Butts]
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
26
What‘s the problem ?
accelerators accelerators µprocessor µprocessor
The brain hurts on paradigm shift ?
no, it can‘t ...
Brain usage: procedural-only
structural hemisphere missing
Crossing the Hardware / Software Chasm [Mike Butts]
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
27
>> outline (5) <<
• Embedded System Design Crisis • Supercomputing Crisis • µP Crisis • CS crisis • CS for Embedded Systems? • New Machine Paradigm • final remarks
http://www.uni-kl.de
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
28
ITRS SoC design cost model [ITRS 2001]
RTL methodology only
w. future improvements
tall t
hin
en
gin
ee
r
sm
all b
loc
k r
eu
se
larg
e b
loc
k r
eu
se
IC im
ple
men
tati
on
to
ols
Inte
llig
en
t te
stb
en
ch
ES
le
ve
l m
eth
od
olo
gy
http://public.itrs.net/Files/2001ITRS/Design.pdf
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
29
SoC System level Design: Embedded SW (ESW)
new design automation from high level descriptions
ESE becomes the main focus in system design:
HW-(E)SW codesign onto highly programmable platforms (SoC)
ESW becomes main vehicle to product differentiation
formal verification for (E)SW
HW-(E)SW-co-verificationH.]
SW synthesis included (SoC)
CW-
CW and
CW-
and CW
(ECW)
ECW
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
30
Complexity: System Level Design Challenge
language infrastructures for complex models (SystemC etc.)
must be leveraged by industry consensus on use-methodology and abstraction levels”
[ITRS 2001]
from HW + (processor-dependent embedded) C code level
“abstraction levels must be raised above present-day RT-level
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
6
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
31
>> outline (6) <<
• Embedded System Design Crisis • Supercomputing Crisis • µP Crisis • CS crisis • CS for Embedded Systems? • New Machine Paradigm • final remarks
http://www.uni-kl.de
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
32
Why a dichotomy of machine paradigms?
data stream machine:
• bad message: caches do not help
• good message: no vN bottleneck
• caches not needed stolen from Bob Colwell
vN bottleneck vN: unbalanced
The anti machine has no von Neumann bottleneck
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
33
computing paradigms and methodologies
1946: machine paradigm (von Neumann)
1980: data streams (Kung, Leiserson)
1989: anti machine paradigm
1990: rDPU (Rabaey)
1994: anti machine high level programming language
1995: super systolic rDPA
1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...
1997+: discipline of distributed memory architecture
1997: configware / software partitioning compiler
flow
war
e*
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
34
Flowware heading toward mainstream
•Data-stream-based Computing is heading for mainstream
–1997 SCCC (LANL) Streams-C Configurabble Computing
–SCORE (UCB) Stream Computations Organized for Reconfigurable Execution
–ASPRC (UCB) Adapting Software Pipelining for Reconfigurable Computing
–2000 Bee (UCB), ...
–Most stream-based multimedia systems, etc.
–Many other areas ....
Flowware:
managing data streams
Software:
managing instruction streams
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
35
control-procedural vs. data-procedural
The structural domain is primarily data-stream-based:
..... mostly not yet modelled that way: most flowware is hidden by its indirect
instruction-stream-based implementation
Flowware provides a (data-)procedural abstraction from the (data-stream-based) structural domain
Flowware converts „procedural vs. structural“ into „control-procedural vs. data-procedural“ ...
... a Troyan horse to introduce the structural domain to the procedural mind set of programmers
Flowware
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
36
flowware defines ....
DPA
x x x
x x x
x x x
|
| |
x x
x
x
x
x
x x
x
- -
-
input data streams
x x
x
x
x
x
x x
x
- -
-
-
-
-
-
-
-
-
-
-
x x x
x x x
x x x
|
|
|
|
|
|
|
|
|
|
|
| output data streams
time
port #
time
time
port # time
port #
... which data item at which time at which port
Placement & routing (configware) done:
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
7
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
37
Programming Language Paradigms
language category Computer Languages Languages f. Anti Machine
both deterministic procedural sequencing: traceable, checkpointable
operation sequence driven by:
read next instruction, goto (instr. addr.),
jump (to instr. addr.), instr. loop, loop nesting
no parallel loops, escapes, instruction stream branching
read next data item, goto (data addr.),
jump (to data addr.), data loop, loop nesting, parallel loops, escapes, data stream branching
state register program counter data counter(s)
address computation
massive memory cycle overhead overhead avoided
Instruction fetch memory cycle overhead overhead avoided
parallel memory bank access interleaving only no restrictions
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
38
Machine paradigms
von Neumann instruction
stream machine M
I/O
instruction sequencer
CPU
instruction stream
I/O M M M M M
(r)DPU
DPU
Software
I/O M M M M M
(r)DPA
memory distributed memory architecture*
data stream
data-stream machine
M
DPU or rDPU
data address generator (data sequencer)
memory
I/O
asM**
Flowware
(Configware)
(reconf.)
*) the new discipline came just in time: see Herz et al.: Proc. IEEE ICECS 2002
+ CPU
- -
DPU
+
memory
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
39
heavy anti atoms: DPA = DPU array
- DPA
- DPU
- DPU
- DPU
- DPU
- DPU
- DPU
- DPU
- DPU
- DPU -
DPA
+
+
+
+
+
+
+ +
+
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
40
Distributed Memory
SA: scrambling and descrambling the data ?
Just in time: a new research area:
Application-specific distributed memory:
e. g. book by F. Catthoor et al. ...
Data address generators - 20 years research:
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
41
Synthesizable distributed memory architecture...
Memory (data memory)
memory bank
memory bank
memory bank
memory bank
memory bank
...
...
Scheduler
for a Stream-based Soft Machine
rDPA “instructions”
Compiler
Sequencers (data stream
generator)
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
42
>> outline (7) <<
• Embedded System Design Crisis • Supercomputing Crisis • µP Crisis • CS crisis • CS for Embedded Systems? • New Machine Paradigm • final remarks
http://www.uni-kl.de
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
8
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
43
Conclusion: all knowledge needed is available
•machine paradigm
•anti machine architectural resources
•sequencing methodology: hw & sw
•parallel memory IP core and module generator vendors
•anything else needed
•compilation techniques
•hw / sw partitioning methodology
• languages
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
44
The Situation in Computing Sciences
• Computing Sciences are in a severe crisis
• New fundamentals and R&D directions are inevitable
• my mission: getting you involved
• All knowledge needed is readily available ...
• ... even from Computing Sciences
• Silicon application and EDA provide useful concepts
• Reconfigurable Computing has the remedy
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
45
>>> we need ... <<<<<
We need a Mead-&-Conway-like text book
We need undergraduate lab courses on HW / CW / SW partitioning
We need new courses with extended scope on parallelism and algorithmic cleverness for HW / CW / SW migration / partitioning
What else do we need ? Your proposals ? © 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
46
>>> we need support <<<<<
We need the support of the open-minded
members of the classical CS community
Let us assemble a list with e-mail addresses
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
47
>>> thank you <<<<<
thank you for your patience
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
48
>>> END <<<
END
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
9
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
49
microprocessor architectures (8)
TU Dresden, 09.05.2003
©Arndt Bode LRR-TUM 49
Mikroprozessorarchitekturen (8):
hochgradig parallele Systeme
E/A SRAM PE PE PE PE PE PE PE PE PE SRAM
E/A
SRAM PE PE PE PE PE PE PE PE PE SRAM
SRAM PE PE PE PE PE PE PE PE SRAM PE
SRAM PE PE PE PE PE PE PE PE SRAM PE
SRAM PE PE PE PE PE PE PE PE SRAM PE
SRAM PE PE PE PE PE PE PE PE SRAM PE
SRAM PE PE PE PE PE PE PE PE SRAM PE E/A E/A
Konfigu-
ration
Manager
©Arndt Bode LRR-TUM © 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
50
PACT XPP: Reference Module: XPU128 Co-Processor
XPP128 rDPA
• Evaluation Board • XDS Development Tool with Simulator
buses not
shown
rDPU
CF
G
PAE
core
ALU CtrlALU
CF
GC
FG
PAE
core
CF
GC
FG
PAE
core
PAE
core
ALU CtrlALUALU CtrlALU
CF
GC
FG
CF
GC
FG
• all used by SIEMENS Corporation • Other contractors preparing .... : ask Ron Mabry (here in the audience)
• Full 32 or 24 Bit Design working silicon • 2 Configuration Hierarchies
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
51
wide variety of speed-up factors
platform application speed-up factor method
PACT Xtreme 4-by-4 array [2003]
16 tap FIR filter x16 MOPS/mW straight forward
*) MPC fabrication via E.I.S. multi university project
key issue: algorithmic cleverness
MoM anti machine with DPLA* [1983]
grid-based DRC**
1-metal 1-poly nMOS 256 reference patterns
> x1000
(computation time)
multiple aspects
**) Design Rule Check
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
52
>>> flowware-based <<<
flowware -based
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
53
asM
Configware / Flowware Compilation
r. Data Path
Array
rDPA intermediate
high level source program
wrapper
configware configware
mapper
flowware flowware
scheduler
M M M M
M M M M
M
M
M
M
M
M
M
M
data streams
data sequencer
address generator
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
54
Configware / Flowware Co-Compilation
intermediate
high level source program
wrapper
r. Data Path
Array
rDPA
configware
mapper
address generator
flowware
scheduler
M M M M
M M M M
M
M
M
M
M
M
M
M
data streams
data sequencer
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
10
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
55
>>> 2nd machine paradigm <<<
2nd machine paradigm
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
56
Matter & Antimatter
The World of Matter machine paradigm: the Atom
+ + - The World of Anti Matter
machine paradigm: Anti Atom
- - +
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
57
Matter & Antimatter of Informatics :
- DPU
+
Anti Machine paradigm
+
CPU
-
nothing central !
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
58
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
array size: 10 x 16 = 160 rDPUs
mapping algorithms efficently onto rDPA
rout thru only
not used backbus connect
SNN filter on KressArray
by the way: example of scalability / relocatability by EDA support
also FPGA scalability (avoid routing congestion) by EDA solution
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
59
One more argument for coarse grain
100
1000
10
1
0.1
0.01
0.001 2 1 0.5 0.25 0.13 0.1 0,07
MOPS / mW
µ feature size
T. Claasen et al.: ISSCC 1999
Wiring by abutment: a 32 Bit KressArray example
if coarse grain cells are full custom and
mesh-connected, and 2nd level interconnect
ressources layouted over the cells
*) R. Hartenstein: ISIS 1997
the array is almost as
area-efficient as hardwired
we have already seen the first day:
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
60
The Secret of Success: Co-Compilation
Analyzer / Profiler
SW code
SW compiler
para d igm “vN" machine
CW Code
CW compiler
anti machine paradigm
Partitioner
Resource Parameters
supporting different platforms
supporting platform-based design
High level PL source
could provide the platforms
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
11
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
61
Machine Paradigms
machine category Computer (the Machine:
“v. Neumann”) The Anti Machine
driven by: Instruction streams data streams (no “dataflow”)
engine principles instruction sequencing sequencing data streams
state register single program counter (multiple) data counter(s)
Communication path set-up .
at run time at load time
resource DPU (e.g. single ALU) DPU or DPA (DPU array) etc. data path
operation sequential parallel pipe network etc.
( “instruction fetch” )
also hardwired implementations* *) e g. Bee project Prof. Broderson
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
62
Programming Language Paradigms
language category Computer Languages Languages f. Anti Machine
both deterministic procedural sequencing: traceable, checkpointable
operation sequence driven by:
read next instruction, goto (instr. addr.),
jump (to instr. addr.), instr. loop, loop nesting
no parallel loops, escapes, instruction stream branching
read next data item, goto (data addr.),
jump (to data addr.), data loop, loop nesting, parallel loops, escapes, data stream branching
state register program counter data counter(s)
address computation
massive memory cycle overhead overhead avoided
Instruction fetch memory cycle overhead overhead avoided
parallel memory bank access interleaving only no restrictions
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
63
Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld
Why Coarse Grain instead of FPGA ?
physical logical
FPGA logical
1980 1990 2000 2010
FPGA physical
100 000 000 000
10 000 000 000
1000 000 000
100 000 000
10 000 000
1000 000
100 000
10 000
1000
Tra
nsis
tors
/ c
hip
~ 10
~ 10 000
drastically smaller configuration memory
a lot of more benefits
much faster loading
FPGA routed
reduced reconfigurability overhead by up to ~ 1000
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
64
KressArray Family generic Fabrics: a few examples
Examples of 2nd Level Interconnect: layouted over rDPU cell - no separate routing areas !
+
rout-through and function
rout-through
only more NNports:
rich Rout Resources
Select Function
Repertory
select Nearest Neighbour (NN) Interconnect: an example
16 32 8 24
4
2 rDPU
Select mode, number, width of NNports
http://kressarray.de
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
65
Changing Models of Computing
“von Neumann”
downloading
RAM
downloading
data path instruction sequencer
I / O
(procedural) Software
hardware/software co-design
software design
the problem with typical CS
people: -the dominance of von Neumann
- they cannot partition
- they cannot migrate
host
hardwired
downloading
accelerator(s)
CAD
RAM
hardware
Software hardware
spec
hardware people needed
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
66
Changing Models of Computing
host
re-
downloading
conf. accelerator(s)
RAM RAM
Software Configware
(structural)
Morphware
configware/software co-design
hardware/configware/software co-design “von Neumann”
downloading
RAM
downloading
data path instruction sequencer
I / O
(procedural) Software
host
hardwired
downloading
accelerator(s)
CAD
RAM
Hardware
Software
hardware/software co-design
software design
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
12
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
67
Super Pipe Networks
pipeline properties array applications
shape resources
mapping scheduling
(data stream formation)
systolic array
regular data
dependencies only
linear only
uniform only
linear projection or algebraic synthesis
super-systolic rDPA
no restrictions simulated
annealing or P&R algorithm
(e.g. force-directed) scheduling algorithm
*
*) KressArray [1995]
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
68
>>> distributed memory <<<
distributed memory
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
69
instruction stream-based Compilation Principles
scheduler
parser
source text
library
link/load instruction call placement
1-D memory space
execution order by location
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
70
Datastream-based Compilation Principles
library
data stream assembly
scheduler
mapper placement & routing
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
71
>>> flowware languages <<<
flowware languages
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
72
Similar Programming Language Paradigms
language category Computer Languages Xputer Languages
both deterministic procedural sequencing: traceable, checkpointable
sequencingdriven by:
read next instruction, goto (instruction addr.), jump (to instruction addr.), instruction loop, instruction loop nesting no parallel loops, instruction loop escapes, instruction stream branching
read next data object, goto (data addr.), jump (to data addr.), data loop, data loop nesting, parallel data loops, data loop escapes, data stream branching
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
13
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
73
JPEG zigzag scan pattern
x
y
*> Declarations
HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan endloop end HalfZigZag;
goto PixMap[1,1]
HalfZigZag; SouthWestScan uturn (HalfZigZag)
HalfZigZag
data counter data counter
data counter data counter
HalfZigZag
EastScan is step by [1,0] end EastScan;
SouthWestScan is loop 8 times until [1,*] step by [-1,1] endloop end SouthWestScan;
SouthScan is step by [0,1] endSouthScan;
NorthEastScan is loop 8 times until [*,1] step by [1,-1] endloop end NorthEastScan;
Flowware language example (MoPL)
Main program:
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
74
JPEG zigzag scan pattern
x
y
EastScan is step by [1,0] end EastScan;
SouthScan is step by [0,1] endSouthScan;
*> Declarations
NorthEastScan is loop 8 times until [*,1] step by [1,-1] endloop end NorthEastScan;
SouthWestScan is loop 8 times until [1,*] step by [-1,1] endloop end SouthWestScan;
HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan endloop end HalfZigZag;
goto PixMap[1,1]
HalfZigZag; SouthWestScan uturn (HalfZigZag)
HalfZigZag
data counter data counter
data counter data counter
2
1
3
4
HalfZigZag
Main program:
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
75
>>> address generators <<<
address generators
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
76
GAG generic address generator Scheme
Base Slider
B0
Limit Slider
L0
0 B
[
Address Stepper
DA
A
D A
| | | |
L
]
limit
all 3 are copies of the same BSU
stepper circuit GAG
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
77
GAG Slider Model
LimitStepper
BaseStepper
AddressStepper
B0AL0
A
LimitStepper
BaseStepper
AddressStepper
B0AL0
A
sliders
B 0 B
[
0 L
]
0 L 0
B 0 B
[
0 A D
A D
L
]
0 L 0
GAG Generic
Address Generator
floor ceiling
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
78
GAG: Address Stepper
GAG =
Address
Generator
Generic
+ /
Escape
Clause End
Detect
Step Counter
=o
L A D A
init tag
A
Address endExec
maxStepCount
0 B Limit Base stepVector
[ ] | |
D A L B 0
[ ] | | | |
limit
GAG: Address Stepper
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
14
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
79
GAG Complex Sequencer Implementation
Limit Slider
Base Slider
GAU
Address Stepper
B0 DA L0
A
all `been published
in 1990
Limit Slider
Base Slider
GAU
Address Stepper
B0 DA L0
A
Limit Slider
Base Slider
GAU
Address Stepper
B0 DA L0
A
GAU GAU
GAG Generic Address Generator
SDS
GAG
VLIW stack
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
80
Generic Sequence Examples
a) b)
c)
d) e) f) g)
Limit Slider
Base Slider
GAG
Address Stepper
B0 DA L0
A
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
81
ceiling
C
address
GAG Slider Operation Demo Example
yx
LB
L0B 0 A
F
floor
LB
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
82 r r
r/w r r
r
r r r
r/w r r
r/w r r
r r r
after inner scan line loop unrolling
final design
after scan line unrolling
hardw. level access optim.
initial design
r r
w/r r r
r
r r r Bank a
Bank a
Bank b
Storage scheme optimization: scanline unrolling
x
y
handle positions
scan window
scan pattern
(high level sequencing)
example
intra scan window accesses
(low level sequencing)
MoM anti machine architecture
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
83 © 2001, [email protected]
University of Kaiserslautern
Xputer Lab
instructions
program cou n ter: state register
Compiler RAM
Datapath
har dw ired
Sequencer
Computer Computer tightly coupled
by compact instruction code
“von Neumann” “von Neumann” does not support soft data paths does not support soft data paths
Datapath
Xputer Xputer
Scheduler
Compiler
RAM
(multiple) sequencer
Datapath Array
“instructions”
University of Kaiserslautern
Xputer Lab
loosely coupled by decision data bits only
Xputer: Xputer: The Soft Machine Paradigm
The Soft Machine Paradigm reconfigurable reconfigurable
also for hardwired also for hardwired
Computer: the wrong Machine Paradigm
“von Neumann”
s d a ta cou n ter
(anti machine) © 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
84
Binding Time vs. Computing Domain
time domain (procedural)
Binding time: (Set-up of Communication Channels)
at run time microprocessor parallel computer
time & space (hybrid)
later fabrication step ASICs
space domain (structural)
before fabrication full custom ICs
at loading time
at compile time
Reconfigurable Computing
array processor
programming domain:
supersystolic arrays systolic
arrays
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
15
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
85
Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld
Why Coarse Grain instead of FPGA ?
physical logical
FPGA logical
1980 1990 2000 2010
FPGA physical
100 000 000 000
10 000 000 000
1000 000 000
100 000 000
10 000 000
1000 000
100 000
10 000
1000
Tra
nsis
tors
/ c
hip
~ 10
~ 10 000
drastically smaller configuration memory
a lot of more benefits
much faster loading
FPGA routed
reduced reconfigurability overhead by up to ~ 1000
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
86
Paradigm Shifts: Nick Tredennick‘s view
algorithms variable
resources fixed
instruction-stream-based computing:
algorithms variable
resources variable
reconfigurable computing:
programmable
why 2 program sources ?
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
87
Compilation for (r)DPA of anti machine
mapper
scheduler
expressionmorphware
configware
streamware
tree
high level source program
wrapperparameters
codegenerators
DPU library
(software notation)
flowware
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
88
machine paradigm: some differences
+ CPU
-
- DPA
+ +
+
- DPU
+
no. of streams ³ 1
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
89
Annihilation?
- +
-
+ - + avoidable
by tools ....
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
90
Matter & Antimatter: Atom and Anti Atom
The World of Matter
Machine paradigm: the Atom
Anti Matter
Machine paradigm: Anti Atom
+ + -
- - +
[email protected] July 20, 2003
Reiner Hartenstein: A Mead-&-Conway-like Break-through is overdue; Seminar Nº 03301, Dynamically Reconfigurable Architectures; Dagstuhl, Germany, July 20-25, 2003
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
16
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
91
Parallelism by Concurrency
+ -
+ -
- +
- +
+ -
- +
- +
independent instruction streams
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
92
Co-Compilation
Xputer
“Soft” Machine Paradigm
Configware running on
partitioning compiler
high level programming language source
mProcessor Reconfigurable
Accelerators inte
rface
Reconfigurable Architecture (RA)
-- instead of hardwired
We introduce: Co-Compilation
Computer Machine Paradigm
Software running on
Xputer
“Soft” Machine Paradigm
Configware running on
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
93
Loop Transformation Examples
loop 1-8 body body endloop
loop 1-8 body endloop
loop 9-16 body endloop
fork
join
strip mining
loop 1-4 trigger endloop
loop 1-2 trigger endloop
loop 1-8 trigger endloop
reconf.array: host: loop 1-16 body endloop
sequential processes: resource parameter driven Co-Compilation
loop unrolling
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
94
„new“ terms
Flowware*: to schedule data streams, similar to software, but data counter manipulation (programming data streams ...
... instead of instruction streams)
Configware: sources for programming morphware
Software: you all know (programming instruction streams)
Hardware: you all know (not programmable) Morphware: structurally programmable „hardware“
(only some terms are „new“, however, not their subject)
clean terminology needed for taxonomy and comprehensibility
*) flowware has no relations to „dataflow machine“
Granularity defines block path width:
fine grain: 1-2 bit coarse grain: > 2 bit multi grain: > 2 bit, variable
algorithms variable
resources variable
algorithms variable
resources fixed
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
95
Why data streams are a common model
Flowware: to schedule data streams Configware: programming the ressources
all other details are defined here: Nick Tredennick‘s
paradigm shifts
Data streams (flowware) are derived from configware having been compiled before
Data stream execution ressources: distributed memory architectures. This new discipline came just in time.
see Herz et al.: Proc. IEEE ICECS 2002 Link (via „recent talks“) also here:
algorithms variable
resources variable
reconfigurable:
algorithms fixed
resources fixed
fully hardwired: not programmable
*) only one source needed
algorithms variable
resources fixed
CPU: