Upload
votruc
View
235
Download
0
Embed Size (px)
Citation preview
Reconfigurable Computing
Introduction
Chapter 1
Prof. Dr.-Ing. Jürgen Teich
Lehrstuhl für Hardware-Software-Co-Design
Reconfigurable Computing
The Von Neumann Computer
• Principle
In 1945, the mathematician Von Neumann (VN)
demonstrated in study of computation that a
computer could have a
simple structure,
capable of executing any kind of program,
given a properly programmed control unit,
without the need of hardware modification.
Reconfigurable Computing
2
The Von Neumann Computer
• Structure
A memory for storing
program and data. The
memory consists of words
with the same length
A control unit (control
path) featuring a program
counter for controlling
program execution
An arithmetic and logic
unit (ALU) also called data
path for execution of
computations
Reconfigurable Computing
3
The Von Neumann Computer
• Coding
A program is coded as a set of instructions to be
sequentially executed
• Program execution Instruction Fetch (IF): The next instruction to be executed is
fetched from the memory
Decode (D): The instruction is decoded to determine the
operation
Read operand (R): The operands are read from the memory
Execute (EX): The required operation is executed on the ALU
Write result (W): The result of the operation is written back to
the memory
Instruction execution in Cycle (IF, D, R, EX, W)
Reconfigurable Computing
4
The Von Neumann Computer
Advantage: Flexibility: any well coded program can be executed
Drawbacks Speed: Not optimal due to the sequential program
execution (temporal resource sharing).
Resource efficiency: Only one part of the hardware resources is required for the execution of an instruction. The rest remains idle.
Memory access: Memories are about 10 times slower than the processor
Drawbacks are compensated using high clock speed, pipelining, caches, instruction pre-fetching, etc.
Reconfigurable Computing
5
The Von Neumann Computer
Sequential executiontcycle
= cycle execution time
One instruction needs
tinstruction
= 5*tcycle
3 instructions are executed in 15*tcycle
Pipelining:One instruction needs
tinstruction
= 5*tcycle
no improvement.
3 instructions need 7*tcycle
in the
ideal case.
Increased throughput
Even with pipeline and other improvements like caches, the execution remains sequential.
Reconfigurable Computing
6
The SPARC LEON3 Harvard Architecture
Reconfigurable Computing
7
Source: Cobham plc. GRLIB IP Core User’s Manual, 2016
Domain-specific processors
Goal:
Overcome the drawback of the von Neumann computer.
Optimize the Data path for a given class of applications
DSP (Digital Signal Processors) :
Signal processing applications are usually multiply-
accumulate (MAC) dominated.
The data path is optimized to execute one or
several MACs in only one single cycle.
Instruction fetching and decoding overhead is
removed
Memory access is limited by directly processing
the input dataflow
Reconfigurable Computing
8
Domain-specific processors
DSPs:Designed for high-performance, repetitive,
numerically intensive tasks
In one instruction cycle, one can execute:
one or more MAC-operations
one or more memory accesses
special support for efficient looping
The hardware contains:
One or more MAC-Units
Multi-ported on-chip and off-chip memories
Multiple on-chip buses
Address generation unit supporting addressing
modes tailored for DSP-applications
Reconfigurable Computing
9
Application-specific processors
Optimize the complete circuit for a given function
ASIC: Application-Specific Integrated Circuit.
Optimization is done by implementing the inherent parallel structure on a chip
The data path is optimized for only one
application.
Instruction fetching and decoding overhead is
removed
Memory access is limited by directly processing
the input data flow or by dedicated memories
Exploitation of parallel computation
Reconfigurable Computing
10
Application-specific processors
ASIC Example: Implementation of a VN computerif (a < b) then
{
d = a+b;
c = a*b;
}
else
{
d = a+1;
c = b-1;
}
At least 3 instructionsExec.time >= 3*t
instruction
Reconfigurable Computing
11
ASIC implementation:The complete execution is done in parallel in a single clock cycleExec.time = t
clock= delay longest path
from input to output
The VN computer needs to be clocked at least 3 times faster (to reach equal exec.time)
Conclusion
• Von Neumann computer:
General purpose, used for any kind of function.
High degree of flexibility.
However, high restrictions on the program coding and execution
scheme
the program has to adapt to the machine
• DSPs are adapted for a class of applications.
Flexibility and efficiency only for a given class of applications.
• ASICs are
Tailored for one application.
Very efficient in speed and resource usage
Cannot re-adapt to a new application
Not flexible
Reconfigurable Computing
12
Reconfigurable Computing: Definition
The ideal device should combine:
the flexibility of the Von Neumann computer
the efficiency of ASICs
The ideal device should be able to
optimally implement an application at a given time
re-adapt to allow the optimal implementation of a new
application.
We call such hardware devices reconfigurable.
Reconfigurable computing can be defined as the study of
computation involving reconfigurable devices. This includes
architecture, algorithms and applications.
Reconfigurable Computing
13
Flexibility vs Efficiency
Reconfigurable Computing
14
Fle
xib
ility
Energy/Area Efficiency
ASIC
Application
specific
computing
Von Neumann
General
purpose
computing
Reconfigurable
systems
Reconfigurable
Computing
DSP
Domain
specific
computing
More detailed
Reconfigurable Computing
15
(all entries
properly
scaled to
130 nm)
FPGA
Standard
Cell
Physically
Optimized
GP-Processor
DSP
1E-05
1E-04
1E-03
1E-02
1E-01
1E+00
1E+01
1E+02
1E+00 1E+01 1E+02 1E+03 1E+04 1E+05 1E+06
MOPS / mm²
mW
/ M
OP
S
Embedded
ARM 940T
Embedded
TI DSP
Embedded
FPGA Macro
seems to be an attrac-
tive compromise …
105
105
Source: Prof. T. Noll, RWTH Aachen
Source: Altera, Inc. Cyclone II
Starter Development Kit.
Fields of application
Reconfigurable Computing
16
Rapid prototyping
Post fabrication customization
Multi-modal computing tasks
Adaptive computing systems
Fault-tolerance
High performance parallel computing
Rapid Prototyping
Reconfigurable Computing
17
Testing hardware before
fabrication
Software simulation
Relatively inexpensive
Slow
Accuracy?
Hardware emulation
Hardware testing under real operation
conditions
Fast
(Relatively) Accurate
Allows for several iterations
APTIX System ExplorerSource: Aptix, Corp.
EVE ZeBu-ServerSource: EVE, Corp.
Synopsys CHIPit SystemSource: Synopsys, Inc.
Post fabrication customization
Reconfigurable Computing
18
Time to market advantage
Ship the first version of a product
Remote upgrading with new
product versions
Remote repairing
Manufacturer
Source: All from MediaWiki
Multi-modal computing tasks
Reconfigurable Computing
19
Reconfigurable vehicles, mobile
phones, etc.
Built-in Digital Camera
Video phone service
Games
Internet
Navigation system
Emergency
Diagnostics
Different standard and protocols
Monitoring
Entertainment
service request
configuration
Source: Continental AG
Source: MediaWiki
Source:Pixabay.com
Source: Pexels.com
Adaptive computing systems
Reconfigurable Computing
20
Computing systems that are able to adapt
their behaviour and structure to changing
operating and environmental conditions,
time-varying optimization objectives, and
physical constraints like changing
protocols, new standards, or dynamically
changing operation conditions of technical
systems.
Jürgen Teich
Dynamic adaptation to environment
Dynamic adaptation to threats (DARPA)
Extended mission capabilities
Source: MediaWiki
Source: MediaWiki
Source: MediaWiki
Fault-Tolerance
Reconfigurable Computing
21
The RecoNets project at FAU:
Autonomous fault detection
on communication lines
Detections of defect nodes
Task migration on node
failure
Load balancing computation
Source: Redlogix GmbH
Fault-Tolerance
Reconfigurable Computing
22
Ctrl1
Akt2Akt1
Sen1
S2
BH
S3
Functionality/
BehaviorArchitecture
Who decides when,
where and how a
task is executed?
Self-repairing automotive systems
ReCo
Link
FPGA
ReCoNode
S1
Ctrl1
Akt1
Sen1
Akt2
Demonstrator
• Network:
– ECU hardware:
Altera-Excalibur-Boards
• NIOS-CPU
• Free hardware resources
– Operating Systems: µC-OS II
– Point-to-Point-Link comm.
• Driver Assistance System:
– Lane Detection
– Warning when leaving lane
unintendedly
Reconfigurable Computing
23
Cockpit
Turn light indicator switch
Sources: MediaWiki and Koch, D. Partial Reconfiguration on FPGAs: Architecture, Tools and Applications, Springer, 2013
Reconfigurable Computing
24
High performance parallel computing
Reconfigurable Computing
25
Traditional parallel implementation flow
Application
1
2 3
4 5 6 7
Virtual Topology Physical Topology
1 2 3 4
Exploiting reconfigurable topology
Application
1
2 3
5 6 74
Virtual Topology
1
2 3
5 64 7
Physical Topology
Top View: Field-Effect Transistor
Reconfigurable Computing
26
The Microprocessor
Reconfigurable Computing
27
10 years of Moore’s-law progress led to the microprocessor
Raised engineers’ productivity
Problem-solving became programming
Grew to billions of units/year
Further speed gains will not be seen any more due to unreliability and higher variations of transistor
Stalled progress in design methods for thirty years
Microprocessor Bottlenecks
Reconfigurable Computing
28
The future of the microprocessor
• Future Multi-Core Designs are already available, but do do
have major problems:
– Shared Memory Model does not scale to hundreds of processors on
a chip
– Distributed Memory Model is difficult to program
– Power consumption and temperature are further problems
• Reconfigurable Processors, Networks, and Memories on a
Chip may be the solution…
Reconfigurable Computing
29
Reconfigurable Networks on a Chip
• FPGA: Xilinx
Virtex II 6000
– 4x4 DyNoC
– 7% of chip area
– Router latency of 2,5ns
– 32bit Data-BUS and
6x4x32bit FIFO for each
router
Reconfigurable Computing
30
Source: Image obtained from Xilinx ISE
Design of Reconfigurable Modules
– The Erlangen Slot Machine
– Example:
Development of partially
reonfigurable video filter modules
Reconfigurable Computing
31
Source: M. Majer, et al. The Erlangen Slot Machine: A dynamically reconfigurable FPGA-based computer, J. VLSI Signal Process. Syst., vol. 47,
pp. 15–31, Apr. 2007.
Example: Partial Reconfiguration
• Partial Reconfiguration
of different Video
Filters
– Normal
– Grey-filter
– Invert
– Contrast-
enhancement
Reconfigurable Computing
32
Source: Image obtained from Xilinx ISE
What do I learn?
• What is Reconfigurable Computing (RC)?– Get to know the types and kinds of reconfigurable device
technologies available today
– Understand pros and cons of RC technology
– Learn about applications benefiting from RC
• Gain Design Know-How and Expertise:– Learn to design circuits and full systems on FPGAs
– Understand mapping steps, and optimization algorithms
• Get to Know the Borders of RC:– Understand leading edge research problems
on exploitation of run-time reconfigurable design methodologies including future trends
Reconfigurable Computing
33