24
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co- Design High Performance Embedded Computing Wayne Wolf

High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

Embed Size (px)

Citation preview

Page 1: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

High Performance Embedded Computing

© 2007 Elsevier

Chapter 7, part 3: Hardware/Software Co-Design

High Performance Embedded ComputingWayne Wolf

Page 2: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Topics

Multi-objective optimization. Co-synthesis for control. Co-synthesis for caches. Co-synthesis for reconfigurable platforms. Hardware/software co-simulation.

Page 3: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Multi-objective optimization

Operations research provides notions for optimization functions with multiple objectives.

Pareto optimality: optimal solution cannot be improved without making something else worse.

Page 4: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

GOPS

Feasibility factor computed from throughput factors.

Upper-bound throughput for RMS:

Upper-bound throughput for EDF:

Page 5: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Upper bound feasibility

Upper-bound feasibility tests:

Page 6: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Lower bound feasibility test

Lower bound:

Page 7: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Feasibility factor

Feasibility factor P:

Use feasibility factor to prune the search space and as an optimization objective.

Page 8: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Genetic algorithms

Modeled as: Genes = strings of symbols. Mutations = changes to strings.

Types of moves: Reproduction makes a copy of a string. Mutation changes a string. Crossover interchanges parts of two strings.

Page 9: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

MOGAC

Technology tables characterize hardware components.

Genetic model: Processing element allocation string lists all PEs

and types. Task allocation string shows assignment of tasks

to PEs. Link allocation task maps communication to links. IC allocation string maps tasks to chips.

Page 10: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

MOGAC optimization procedure Forms initial solution. Repeats

evolve/evaluate cycle. Evaluation determines

noninferior solutions. Some noninferior

solutions may not survive evolution.

Clusters solutions to reduce run time. [Dic98] © 1998 IEEE

Page 11: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

MOGAC constraints

nis(x): noninferior solutions in x. dom(a,b) = 1 if a is not dominated by b. Cluster rank:

Page 12: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Energy-aware task scheduling Yang et al. schedule multiprocessors for

energy. Combine design-time and runtime:

At design time, scheduler evaluates scheduling/allocation choices; optimizes with genetic algorithms; generates table.

At run time, heuristics use the table to choose best scheduling/allocation pattern.

Page 13: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Co-synthesis for wireless

Wireless systems are bandwidth and energy limited.

COWLS uses parallel recombinative simulated annealing. Ranked by communication time, computation

time, utilization. Scheduling influences both power

consumption and timing. Slack determines idle time.

Page 14: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Control and I/O synthesis

Control finite-state machine (CFSM) model describges control-dominated systems.

Event-driven model. Finite, non-zero, unbounded reaction times. Implementations:

Hardware is logic guarded by latches. Software is synthesized from s-graph that models control

flow graph. Can be used as an intermediate representation for

Esterel, etc.

Page 15: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Modal process model

Chou et al. use modal models: I/O behavior depends on current mode and on

inputs. Abstract control types define control

operations with known properties.

Page 16: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Interface synthesis

Chou et al. represent I/O as control flow graphs. Generate tasks, allocate I/O ports, split wide-word

operations, use memory mapped I/O where ncessary, generate I/O sequencer.

Daveau et al. synthesize communication by allocating operations to units in a library. Communication unit must provide requred

services, use the right protocol, and run at the required data rate.

Page 17: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Cache modeling for co-synthesis Cache state affects

task execution time. Li and Wolf used two-

state model for processes in cache: One time if in cache. Another time if not in

cache. This model is more

abstract than cache line model.

[Li99] © 1999 IEEE

Page 18: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Co-synthesis with caches

System cost:

Hierarchical scheduling algorithm: Schedule tasks (>=

process) over hyperperiod. Refine schedule by moving

processes within a task. Dynamic urgency models

how process uses cache:

Page 19: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Wuytack et al.

Methodology for dynamic memory management:1. Define application using abstract data types.2. Refine ADTs into concrete data structures.3. Virtual memory divided among several memory

managers.4. Spit virtual memory segments into groups to

parallelize data accesses.5. Order background memory accesses to optimize

bandwidth.6. Allocate physical memories.

Page 20: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Co-synthesis for reconfigurable systems FPGA fabric can hold

different accelerators at different times.

Combinations of accelerators may be limited. Must take floorplan into

account. Schedule must take

reconfiguration time, energy into account.

Page 21: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

CORDS

CORDS uses evolutionary algorithms similar to MOGAC. Adds reconfiguration delay to costs based on

current schedule state. Dynamic priority of task depends on slack +

reconfiguration delay. Increases dynamic priority of tasks with low

reconfiguration time to group together several reconfigurations and save energy.

Page 22: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Nimble

Performs fine-grained partitioning for instruction-level parallelism.

Platform described in architecture description language.

Program represented as control flow graph.

Selects interesting parts of loops by analyzing control dependence graph.

[Li00] © 2000 IEEE

Page 23: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Hardware/software co-simulation Must connect models with

different models of computation, different time scales.

Simulation backplane manages communication.

Becker et al. used PLI in Verilog-XL to add C code that communicates with software models, UNIX networking to connect hardware simulator.

Page 24: High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier

Mentor Graphics Seamless

Hardware modules described using standard HDLs.

Software can be loaded as C or binary. Bus interface module connects hardware

models to processor instruction set simulator. Coherent memory server manages shared

memory.