Upload
conlan
View
44
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Multi-cellular paradigm. The molecular level can support self-replication (and self-repair). But we also need cells that can be designed to fit the specific application and at the same time able to support bio-inspired mechanisms for self-replication and fault tolerance. - PowerPoint PPT Presentation
Citation preview
Multi-cellular paradigm
The molecular level can support self-replication (and self-repair).
But we also need cells that can be designed to fit the specific application and at the same time able to support bio-inspired mechanisms for self-replication and fault tolerance.
Cellular differentiation Cells adapt their physical
structure to fit the “application”
Can circuits/processors do the same? Physically? No Logically? Yes, but…
Can they do it easily (dare we say, automatically)?
The MOVE paradigm
One single instruction : move Data displacements trigger
operations Architecture based around
data ≠ operation centric Regular structure : functional
units + data network Scalable and modular
architecture
Example: Sum of two values
Conventional architecture:add R1, R2, R3;
MOVE architecture: move O(Fxxx), I1(Fsum)
move O(Fyyy), I2(Fsum)move O(Fsum), I(Fzzz)
Genotype Layer
Phenotype Layer
Example – Automatic Synthesis
Application-specific (parallel) functions
Developmental algorithm
Genetic code
Mapping Layer
Phenotype LayerCell design and specialization
Application code (parallel)
Within a MOVE framework, the specialization (differentiation) of a cell corresponds to the selection of the functional and communication units that can most efficiently implement the desired application.
Hardware/software co-design Co-design of a system : tailor an hardware system
for a specific application which can then exploit the synergism of that hardware and software. In a MOVE context (but also generally), this amounts
to defining the structure of the processor (data size, number of busses, functional and communication units, etc.) at the same time as the program that runs on the processor.
HW/SW co-design is a task for the compiler. Hardware-Software Partitioning
Purpose : determine which parts of an algorithm (or a program) are the best candidates to be implemented in hardware
Optimization goal : minimize execution time and/or hardware area
Constraints : hardware size and/or execution time It’s an known NP-hard problem.
FU extraction Extracting the optimal FUs from the code is a
complex problem!
FU extraction How about having a quick
peek at biology?
Idea: let us use evolution!!
In fact, this approach is much closer to biology than simply evolving code: in nature, the hardware (the cell) and the software (the genome) have evolved together!
FU extraction Idea: let us use evolution to determine which FU should
be implemented for a given program, starting from a given hardware budget and a “minimal” processor
FU extraction First step: profiling the code (standard
compilation technique)
FU extraction Second step:
transform into tree (standard compilation technique), then define a field to determine if parts of the tree should be turned into hardware.
Third step: represent as 1-D genome (standard EA technique)
Fourth step: run the EA (with some fancy optimizations)
Fitness evaluation
s = size of the new processort = execution time of the program on the new processorα = execution time of the program on a minimal processorβ = hardware area to implement the minimal processor (which has, by definition,
a fitness of 1)hwLimit = maximum hardware allowed to implement the new processor
Note:• Relative fitness function• When out of allowed hardware
range, logarithmic decrease• The hardware investment has to
be small enough to be retained
Determining hardware size How can the size of the new FU estimated (the β
parameter of the fitness) ? The idea:
Determine the size of each basic building block (+, - , AND, …)
Compute how many of them are used for a new FU
Add a cost for the bus interface The characterization has to be done for every
target platform.
Determining hardware execution time Use the same idea used for size to calculate execution
time: Compute the time needed for each building block Take targeted clock period as a basis When time estimated > clock period, add 1 to the
total time small jumps in the fitness landscape
Improvements Biggest problem with the approach presented :
not very competitive because of the search space size Does not converge to a solution in reasonable
time VAST amount of “junk” solutions
How can we improve the performance? How about using some heuristics? Try to reuse the inferred FU at different places in
the program Remove “useless” FUs
Pattern-matching optimization How to find reusable FUs ?
The GA behaves a bit like random mutations difficult to find reusability this way
Solution: search the whole tree each time a new HW block is defined to replace similar pieces of code
Non-optimal block pruning
“Cleaning” phase made at each generation of the algorithm
Removes HW blocks that are non-optimal from the fitness point-of-view
To see if a block is useful, compute the fitness with and without this block implemented in HW. If the software solution has a better fitness, the block is non-optimal and can be removed.
FU extraction - Interface
STANDARD DOMAIN
FU extraction - Results Example (functions from FACT factorization
algorithm): Hardware increase (estimated): 10% (fixed) Speedup (estimated): 2.27 (227%)
Other results:
All were obtained in a few seconds
Problem solved? Evolution allows the extraction of FUs according
to arbitrary criteria: Performance Size Power consumption Reliability Any other… and in any combination
It also generates a range of solutions, allowing the engineer to choose among several options
Problem solved? Definitely not… co-design is a HARD problem
and a lot remains to be explored.
Can Nature help?
Metabolic pathways Immune systems Stem cells Regeneration Homeostasis Developmental plasticity etc…
For all of these, an efficient mapping to hardware remains an open research topic