Upload
conrad-williamson
View
228
Download
0
Embed Size (px)
Citation preview
High Performance Embedded Computing
© 2007 Elsevier
Chapter 7, part 3: Hardware/Software Co-Design
High Performance Embedded ComputingWayne Wolf
© 2006 Elsevier
Topics
Multi-objective optimization. Co-synthesis for control. Co-synthesis for caches. Co-synthesis for reconfigurable platforms. Hardware/software co-simulation.
© 2006 Elsevier
Multi-objective optimization
Operations research provides notions for optimization functions with multiple objectives.
Pareto optimality: optimal solution cannot be improved without making something else worse.
© 2006 Elsevier
GOPS
Feasibility factor computed from throughput factors.
Upper-bound throughput for RMS:
Upper-bound throughput for EDF:
© 2006 Elsevier
Upper bound feasibility
Upper-bound feasibility tests:
© 2006 Elsevier
Lower bound feasibility test
Lower bound:
© 2006 Elsevier
Feasibility factor
Feasibility factor P:
Use feasibility factor to prune the search space and as an optimization objective.
© 2006 Elsevier
Genetic algorithms
Modeled as: Genes = strings of symbols. Mutations = changes to strings.
Types of moves: Reproduction makes a copy of a string. Mutation changes a string. Crossover interchanges parts of two strings.
© 2006 Elsevier
MOGAC
Technology tables characterize hardware components.
Genetic model: Processing element allocation string lists all PEs
and types. Task allocation string shows assignment of tasks
to PEs. Link allocation task maps communication to links. IC allocation string maps tasks to chips.
© 2006 Elsevier
MOGAC optimization procedure Forms initial solution. Repeats
evolve/evaluate cycle. Evaluation determines
noninferior solutions. Some noninferior
solutions may not survive evolution.
Clusters solutions to reduce run time. [Dic98] © 1998 IEEE
© 2006 Elsevier
MOGAC constraints
nis(x): noninferior solutions in x. dom(a,b) = 1 if a is not dominated by b. Cluster rank:
© 2006 Elsevier
Energy-aware task scheduling Yang et al. schedule multiprocessors for
energy. Combine design-time and runtime:
At design time, scheduler evaluates scheduling/allocation choices; optimizes with genetic algorithms; generates table.
At run time, heuristics use the table to choose best scheduling/allocation pattern.
© 2006 Elsevier
Co-synthesis for wireless
Wireless systems are bandwidth and energy limited.
COWLS uses parallel recombinative simulated annealing. Ranked by communication time, computation
time, utilization. Scheduling influences both power
consumption and timing. Slack determines idle time.
© 2006 Elsevier
Control and I/O synthesis
Control finite-state machine (CFSM) model describges control-dominated systems.
Event-driven model. Finite, non-zero, unbounded reaction times. Implementations:
Hardware is logic guarded by latches. Software is synthesized from s-graph that models control
flow graph. Can be used as an intermediate representation for
Esterel, etc.
© 2006 Elsevier
Modal process model
Chou et al. use modal models: I/O behavior depends on current mode and on
inputs. Abstract control types define control
operations with known properties.
© 2006 Elsevier
Interface synthesis
Chou et al. represent I/O as control flow graphs. Generate tasks, allocate I/O ports, split wide-word
operations, use memory mapped I/O where ncessary, generate I/O sequencer.
Daveau et al. synthesize communication by allocating operations to units in a library. Communication unit must provide requred
services, use the right protocol, and run at the required data rate.
© 2006 Elsevier
Cache modeling for co-synthesis Cache state affects
task execution time. Li and Wolf used two-
state model for processes in cache: One time if in cache. Another time if not in
cache. This model is more
abstract than cache line model.
[Li99] © 1999 IEEE
© 2006 Elsevier
Co-synthesis with caches
System cost:
Hierarchical scheduling algorithm: Schedule tasks (>=
process) over hyperperiod. Refine schedule by moving
processes within a task. Dynamic urgency models
how process uses cache:
© 2006 Elsevier
Wuytack et al.
Methodology for dynamic memory management:1. Define application using abstract data types.2. Refine ADTs into concrete data structures.3. Virtual memory divided among several memory
managers.4. Spit virtual memory segments into groups to
parallelize data accesses.5. Order background memory accesses to optimize
bandwidth.6. Allocate physical memories.
© 2006 Elsevier
Co-synthesis for reconfigurable systems FPGA fabric can hold
different accelerators at different times.
Combinations of accelerators may be limited. Must take floorplan into
account. Schedule must take
reconfiguration time, energy into account.
© 2006 Elsevier
CORDS
CORDS uses evolutionary algorithms similar to MOGAC. Adds reconfiguration delay to costs based on
current schedule state. Dynamic priority of task depends on slack +
reconfiguration delay. Increases dynamic priority of tasks with low
reconfiguration time to group together several reconfigurations and save energy.
© 2006 Elsevier
Nimble
Performs fine-grained partitioning for instruction-level parallelism.
Platform described in architecture description language.
Program represented as control flow graph.
Selects interesting parts of loops by analyzing control dependence graph.
[Li00] © 2000 IEEE
© 2006 Elsevier
Hardware/software co-simulation Must connect models with
different models of computation, different time scales.
Simulation backplane manages communication.
Becker et al. used PLI in Verilog-XL to add C code that communicates with software models, UNIX networking to connect hardware simulator.
© 2006 Elsevier
Mentor Graphics Seamless
Hardware modules described using standard HDLs.
Software can be loaded as C or binary. Bus interface module connects hardware
models to processor instruction set simulator. Coherent memory server manages shared
memory.