View
216
Download
0
Category
Tags:
Preview:
Citation preview
Michel Goraczko, Jie Liu (Microsoft Research, Redmond)
Dimitrios Lymberopoulos (Yale University)
Slobodan Matic (UC Berkeley)
Bodhi Priyantha Feng Zhao (Microsoft Research, Redmond)
Presentation at DAC 2008, Anaheim, CA
June 10th, 2008
1
Energy-Optimal Software Energy-Optimal Software Partitioning in Partitioning in HeterogeneousHeterogeneousMultiprocessor Embedded Multiprocessor Embedded SystemsSystems
Energy Usage in Embedded ApplicationsEnergy Usage in Embedded Applications
Low duty cycle monitoring for long battery life
High throughput for realtime critical events processing.
Mobile devicesPatient monitoring Smart environments
Energy Performance DiversityEnergy Performance Diversity
• A single processor with DVFS may not be flexible enough.
― Energy efficiency in embedded processors
― Non-trivial wake-up latency and energy costs
Benchmark Platform Execution TimeEnergy Relative Speed Relative EfficiencyPXA255 24.8us 12.9uJ 207.7 28.4ARM7 330us 49.2uJ 15.6 7.5Atmega 5.15ms 367uJ 1 1PXA255 325us 166.9uJ 516.9 70.7ARM7 4.8ms 699uJ 35 17.6Atmega 168ms 11.8mJ 1 1PXA255 94.5us 45.8uJ 153.4 20.4ARM7 1.2ms 187uJ 12.1 5Atmega 14.5ms 934uJ 1 1
FFT
CRC-32
FIR
Heterogeneous Multi-Processor PlatformsHeterogeneous Multi-Processor Platforms
UCLA LEAP Platform MSR mPlatform
OutlineOutline
Introduction
Design Flow
Power State Machine
ILP Formulation and Optimization
A Sound Source Localization Case Study
Software Partitioning ProblemSoftware Partitioning Problem
Given a time sensitive application, allocate software components to different processors to minimize energy consumption without violating timing constraints.
Given a time sensitive application, allocate software components to different processors to minimize energy consumption without violating timing constraints.
TasksProcessor
modes
TimingAnalysi
s
Task timing
Partitioning
Applicationstructure/
requirementsPower model
Task-Processor-Modeassignments
Power State MachinesPower State Machines
STBYPower: ~0 mW
STBYPower: ~0 mW
IDLEPower:
0.25mW
IDLEPower:
0.25mW
60MHzPower: 141
mW
60MHzPower: 141
mW
30MHzPower: 72 mW
30MHzPower: 72 mW
7.5MHzPower: 20 mW
7.5MHzPower: 20 mW
negligiblenegligible
negligible
1.53 mJ
24.5 ms
0.1 mJ
1.4 ms
1.47 mJ
23.8 ms
Software ModelSoftware Model
Directed acyclic graph of tasks
Single-rate periodic execution
Known release time
Known end-to-end deadline
Worst case execution time:
Pre-assignments
mpT ,,
ILP: Variables and ObjectiveILP: Variables and Objective
Core binary variables task-to-processor assignment; task-to-mode assignment; task transition assignment;
Core integer variables task start time instances;
Derived variables: In order to convert the problem into ILP formulations, need to
further introduce auxiliary variables.
Objective: minimize total energy per iteration
)( ,mpnnO
)( nO
))(( mp nnnnO
ILP: ConstraintsILP: Constraints
A task can only be allocated to one processor and one mode;
A processor can only execute one task at any time;
Waking up from sleep modes takes time;
Processor total utilization should be less than 1;
Tasks have dependencies with in an iteration;
Tasks have dependencies across iteration boundaries;
No task can start before its release time;
All tasks should finish by the deadline;
S – Audio Sampling
FFT – Fast Fourier Transform
SC – Noise Estimation &
Signal classification
HT – Hypothesis Testing
VOTE – Sound detection voting
Case StudyCase Study
FFT
FFT
FFT
FFT
SC
SC
SC
SC
VOTE HT
Sound Source LocalizationSound Source Localization
Hardware ModelHardware Model
Power Mode ARM7 @ 2.5V 60MHz full speed
MSP430 @ 3V6MHz full speed
Full speed 141 10.8
1/2 speed 72 2.7
1/8 speed 20 1.4
Idle 0.25 ~0
Standby ~0 ~0
ARM7 @2.5V MSP430 @3V
Wake up Energy (mJ) Time (ms) Energy (mJ) Time (ms)
To full speed 1.5 24.5 ~0 0.006
To 1/8 speed 0.1 1.4 ~0 ~0
Task ProfilingTask Profiling
Proc Mode FFT (ms) SC(ms) HT (ms)
ARM760MHz 7.8 4.4 111
30MHz 15.6 9.0 222
7.5MHz 39.6 23.3 567
MSP430
6MHz 99.2 37.2
3MHz 196 76
1.5MHz 394 152
0.75MHz 792 300
Partitioning Results (1)Partitioning Results (1)
Deadline: 128ms
Need 4 MSP430
ARM7 @ 60MHz
Total energy/iteration: 21.7mJ
Average power: 166.7mW
50 100 150
50 100 150
50 100 150
50 100 150
50 100 150
ARM760MHz
MSP-46MHz
MSP-3
6MHz
MSP-2
6MHz
MSP-1
6MHz
HTHT
FFTFFT SCSC
FFTFFT SCSC
FFTFFT SCSC
FFTFFT SCSC
Scheduling Results (2)Scheduling Results (2)
Deadline: 256ms
Need 2 MSP430
ARM7 @ 30MHz
Total energy/iteration: 22.1mJ
Average power: 86.4mW
50 100 150 200 256
50 100 150 200 256
50 100 150 200 256
50 100 150 200 256
50 100 150 200 256
ARM
30MHz
MSP4
6MHz
MSP3
6MHz
MSP2
6MHz
MSP1
6MHz
HTHT
FFTFFT SCSC
FFTFFT SCSC
FFTFFT SCSC
FFTFFT SCSC
Scheduling Results (3)Scheduling Results (3)
200 400
200 400
200 400
200 400 600 800 1000
200 400 600 800 1000
ARM7
7.5MHz
MSP4
6MHz
MSP3
6MHz
MSP2
6MHz
MSP1
6MHz
4xFFT4xFFT HTHT
SCSC
600 800 1000
600 800 1000
600 800 1000
SCSC
SCSC
SCSC
• Deadline: 1000ms
• Need 2 MSP430• ARM7 @ 7.5MHz• Total energy/iteration:
16.2mJ• Average power: 16.2mW
ConclusionConclusion
Processor diversities can help energy saving.
Wakeup time and energy must be considered in software partitioning.
Optimal software partitioning is NP–hard, but can be formulated as an ILP problem.
Limitations & Future WorkLimitations & Future Work
Execution time variations
Aperiodic tasks
Lightweight heuristics for online scheduling
Recommended