High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

High Performance Embedded Computing

© 2007 Elsevier

Chapter 7, part 3: Hardware/Software Co-Design

High Performance Embedded ComputingWayne Wolf

© 2006 Elsevier

Topics

Multi-objective optimization. Co-synthesis for control. Co-synthesis for caches. Co-synthesis for reconfigurable platforms. Hardware/software co-simulation.

© 2006 Elsevier

Multi-objective optimization

Operations research provides notions for optimization functions with multiple objectives.

Pareto optimality: optimal solution cannot be improved without making something else worse.

© 2006 Elsevier

GOPS

Feasibility factor computed from throughput factors.

Upper-bound throughput for RMS:

Upper-bound throughput for EDF:

© 2006 Elsevier

Upper bound feasibility

Upper-bound feasibility tests:

© 2006 Elsevier

Lower bound feasibility test

Lower bound:

© 2006 Elsevier

Feasibility factor

Feasibility factor P:

Use feasibility factor to prune the search space and as an optimization objective.

© 2006 Elsevier

Genetic algorithms

Modeled as: Genes = strings of symbols. Mutations = changes to strings.

Types of moves: Reproduction makes a copy of a string. Mutation changes a string. Crossover interchanges parts of two strings.

© 2006 Elsevier

MOGAC

Technology tables characterize hardware components.

Genetic model: Processing element allocation string lists all PEs

and types. Task allocation string shows assignment of tasks

to PEs. Link allocation task maps communication to links. IC allocation string maps tasks to chips.

© 2006 Elsevier

MOGAC optimization procedure Forms initial solution. Repeats

evolve/evaluate cycle. Evaluation determines

noninferior solutions. Some noninferior

solutions may not survive evolution.

Clusters solutions to reduce run time. [Dic98] © 1998 IEEE

© 2006 Elsevier

MOGAC constraints

nis(x): noninferior solutions in x. dom(a,b) = 1 if a is not dominated by b. Cluster rank:

© 2006 Elsevier

Energy-aware task scheduling Yang et al. schedule multiprocessors for

energy. Combine design-time and runtime:

At design time, scheduler evaluates scheduling/allocation choices; optimizes with genetic algorithms; generates table.

At run time, heuristics use the table to choose best scheduling/allocation pattern.

© 2006 Elsevier

Co-synthesis for wireless

Wireless systems are bandwidth and energy limited.

COWLS uses parallel recombinative simulated annealing. Ranked by communication time, computation

time, utilization. Scheduling influences both power

consumption and timing. Slack determines idle time.

© 2006 Elsevier

Control and I/O synthesis

Control finite-state machine (CFSM) model describges control-dominated systems.

Event-driven model. Finite, non-zero, unbounded reaction times. Implementations:

Hardware is logic guarded by latches. Software is synthesized from s-graph that models control

flow graph. Can be used as an intermediate representation for

Esterel, etc.

© 2006 Elsevier

Modal process model

Chou et al. use modal models: I/O behavior depends on current mode and on

inputs. Abstract control types define control

operations with known properties.

© 2006 Elsevier

Interface synthesis

Chou et al. represent I/O as control flow graphs. Generate tasks, allocate I/O ports, split wide-word

operations, use memory mapped I/O where ncessary, generate I/O sequencer.

Daveau et al. synthesize communication by allocating operations to units in a library. Communication unit must provide requred

services, use the right protocol, and run at the required data rate.

© 2006 Elsevier

Cache modeling for co-synthesis Cache state affects

task execution time. Li and Wolf used two-

state model for processes in cache: One time if in cache. Another time if not in

cache. This model is more

abstract than cache line model.

[Li99] © 1999 IEEE

© 2006 Elsevier

Co-synthesis with caches

System cost:

Hierarchical scheduling algorithm: Schedule tasks (>=

process) over hyperperiod. Refine schedule by moving

processes within a task. Dynamic urgency models

how process uses cache:

© 2006 Elsevier

Wuytack et al.

Methodology for dynamic memory management:1. Define application using abstract data types.2. Refine ADTs into concrete data structures.3. Virtual memory divided among several memory

managers.4. Spit virtual memory segments into groups to

parallelize data accesses.5. Order background memory accesses to optimize

bandwidth.6. Allocate physical memories.

© 2006 Elsevier

Co-synthesis for reconfigurable systems FPGA fabric can hold

different accelerators at different times.

Combinations of accelerators may be limited. Must take floorplan into

account. Schedule must take

reconfiguration time, energy into account.

© 2006 Elsevier

CORDS

CORDS uses evolutionary algorithms similar to MOGAC. Adds reconfiguration delay to costs based on

current schedule state. Dynamic priority of task depends on slack +

reconfiguration delay. Increases dynamic priority of tasks with low

reconfiguration time to group together several reconfigurations and save energy.

© 2006 Elsevier

Nimble

Performs fine-grained partitioning for instruction-level parallelism.

Platform described in architecture description language.

Program represented as control flow graph.

Selects interesting parts of loops by analyzing control dependence graph.

[Li00] © 2000 IEEE

© 2006 Elsevier

Hardware/software co-simulation Must connect models with

different models of computation, different time scales.

Simulation backplane manages communication.

Becker et al. used PLI in Verilog-XL to add C code that communicates with software models, UNIX networking to connect hardware simulator.

© 2006 Elsevier

Mentor Graphics Seamless

Hardware modules described using standard HDLs.

Software can be loaded as C or binary. Bus interface module connects hardware

models to processor instruction set simulator. Coherent memory server manages shared

memory.

Documents

High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf