Upload
gary-baldwin
View
221
Download
0
Embed Size (px)
Citation preview
Vinay Hanumaiah1 and Sarma Vrudhula2
1Electrical Engineering , Arizona State University2Computer Science Engineering , Arizona State University
Reliability-aware Thermal Management for Hard Real-time Applications on Multi-core Processors
DTM and Reliability
• High temperature greatly degrades reliability
• high peak temperature
• large no. of thermal cycles
• 10°C – 15°C increase reduces reliability by half
• Multi-cores have large temporal and spatial thermal variations
• higher gradients higher reliability degradation
• requires invoking DTM more often
• DTM allows complex objectives and granular control
Related Work
3
• Effects of temperature on reliability
• Coskun:Sigmetrics’07
• Lu:IEEEMICRO’05
• Min. peak temperature with deadline constraints
• Chantem:DATE’08 (many-core, task allocation),
• Jayaseelan:ICCAD’08 (single, task sequence)
• Maximize throughput
• Wang:ECRTS’06 (thermal, timing, single-core)
• Murali:CODES’07 (thermal, no deadlines, many-core)
What is our Contribution?
4
Determine optimal speed profile
•For many core processor
•Minimize peak temperature
•Satisfy task deadlines,
• while considering start times
• include leakage dependence on temperature
Power and Thermal Model
5
Full HotSpot model Simplified thermal model
• ignores lateral resistance • ignores die capacitances• Lumped package• < 6% loss in accuracy• required for analytical analysis
Problem Formulation
6
ObjectiveFind cores speed profile that minimizes peak temperature
Given n tasks, instruction length, power profile n cores, RC thermal model
ConstraintsStart times and deadlines
Assumptions Independent and non-identical threads One thread per core Simplified thermal model
7
Solution Outline
• Step 1 – Find parametric optimal speed profile [Hanumaiah:DATE’09]
• Fixed maximum temperature
• No deadlines
• Step 2 – find Parameters in Step 1 for every slot
• To satisfy task deadlines for given initial package temperature
Solution Outline - contd
8
• Step 3 – For every slot
• find initial package temperature to satisfy start times
• also determine global min peak temperature
Step1: Fixed max. temp., no deadlines
9
Step 2: Fixed max. temp., with deadlines
10
Need for Step 2
• Find the total power PT for corresponding Tpkg
• Find optimal speed profile for the critical task
• Determine Tpkg over the slot
Step 2: Fixed max. temp., with deadlines
Step 2: Power allocation scheme
• Let tsched = unit scheduling interval
• Determine approx. dTpkg(tsched)/dt
• Find corresponding PT (tsched)
• PT (tsched) = PT (tsched) – Pcritical (tsched)
• Sort tasks according to nearest deadline
• Allocate max. power Pmax,i (tsched) to the earliest task
• PT (tsched) = PT (tsched) – Pmax,i (tsched)
• Continue until PT (tsched) =0
Step 3: Satisfy Start Times
• Instruction completed in each slot is monotonic
• with initial package temperature of slots
• with the maximum temperature
• Can be solved optimally as quasiconcave (monotonic) optimization
Experimental Setup
14
• Multicore version of Alpha 21264
• HotSpot – thermal model, PTScalar – power model
• SPEC benchmarks
• Dynamic power – 230 W, leakage power – 60 W
• Scheduling interval – 10 ms
Trade-off: Peak Temperature vs Deadlines
15
Relaxed deadlines Tight deadlines
Optimal Policy vs Min. Makespan Policy
16
Opt. policy - relaxed deadlines Min. makespan
Discretization of Optimal Policy
17
Continuous versionDiscrete version
8 speeds
Summary
• Proposed reliability-aware transient speed policy
• Minimizes peak temperature
• Satisfies task deadlines and start times
• Includes accurate power and thermal models
• Optimal trade-off of peak temperature with deadlines
• Incorporated in Magma simulator
• Fast, accurate thermal-aware architectural simulator
• Available as open source at http://vrudhula.lab.asu.edu/magma/
18