Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
- 1 -
ARTIST Summer School in Europe 2010ARTIST Summer School in Europe 2010Autrans (near Grenoble), FranceAutrans (near Grenoble), France
September 5September 5--10, 201010, 2010
ThermalThermal--Aware Design ofAware Design ofThermalThermal--Aware Design ofAware Design ofThermalThermal Aware Design of Aware Design of
2D/3D Multi2D/3D Multi--Processor Processor
ThermalThermal Aware Design of Aware Design of
2D/3D Multi2D/3D Multi--Processor Processor
SystemSystem--onon--Chip Chip ArchitecturesArchitecturesSystemSystem--onon--Chip Chip ArchitecturesArchitectures
I i d S kI i d S k D id A iD id A iI i d S kI i d S k D id A iD id A iInvited Speaker: Invited Speaker: David Atienza, David Atienza,
Professor and Director of Embedded Systems Laboratory (ESL), EPFLProfessor and Director of Embedded Systems Laboratory (ESL), EPFL
Invited Speaker: Invited Speaker: David Atienza, David Atienza,
Professor and Director of Embedded Systems Laboratory (ESL), EPFLProfessor and Director of Embedded Systems Laboratory (ESL), EPFL
http://www.artist-embedded.org/
Evolution of Electronics to Evolution of Electronics to MultiMulti--Processor SystemProcessor System--onon--Chip (Chip (MPSoCMPSoC))
Roadmap continues: 90Roadmap continues: 90656545 nm45 nm
MultiMulti Processor SystemProcessor System onon Chip (Chip (MPSoCMPSoC))
CMOS Roadmap continues: 90Roadmap continues: 90656545 nm45 nm
MultiMulti--Processor SystemProcessor System--onon--Chip Chip
CMOS90nmCMOS
65nm CMOS45nmyy pp
(MPSoC) architectures(MPSoC) architectures
I/0 PE PE PE I/OPER
SD
R
PEI/0 SRAM SRAM
RIPHE
RA
M m
ain
PE
Local
CPU
PESRAM
I/O
ERALS
n mem
or
LocalMemoryhierarchy
i/o
2
I/O S
ry
Evolution of Electronics to Evolution of Electronics to MultiMulti--Processor SystemProcessor System--onon--Chip (Chip (MPSoCMPSoC))
Roadmap continues: 90Roadmap continues: 90656545 nm45 nm
MultiMulti Processor SystemProcessor System onon Chip (Chip (MPSoCMPSoC))
CMOS Roadmap continues: 90Roadmap continues: 90656545 nm45 nm
MultiMulti--Processor SystemProcessor System--onon--Chip Chip
CMOS90nmCMOS
65nm CMOS45nmyy pp
(MPSoC) architectures(MPSoC) architectures
I/0 PE PE PE I/OPER
SD
R
80-tile 1.28TFLOPS
MPS C INTELI/0 SRAM SRAM
RIPHE
RA
M m
ain
MPSoC INTEL [ISSCC ‘07]
[Cell Multi-Processor – PS3]PE
SRAM
I/O
ERALS
n mem
or
[Cell Multi Processor PS3]
3
I/O S
ry
MPSoCsMPSoCs are Spreading Fastare Spreading Fastp gp g
512 Pi hi[Amarasinghe06]
128256
512
Intel
CiscoCSR-1
PicochipPC102
AmbricAM2045
res
32
64
128 Tflops
# of
cor
8
16
32Raw
Niagara Cell
CaviumOcteon
RazaXLR
#
2
4
8
Power4 Opteron
Niagara
TanglewoodXbox360
PA-8800
Boardcom 1480 Opteron 4PXeon MP
4004
8008
80868080 286 386 486 Pentium P2 P3P4Itanium
Itanium 21
2
Athlon
Power4Power6
YonahPExtreme
4
1985 199019801970 1975 1995 2000 2005 20??
Design Issues in Design Issues in MPSoCsMPSoCsgg
MPSoCsMPSoCs have very complex architectureshave very complex architectures Advanced components and CAD tools very expensiveAdvanced components and CAD tools very expensive TimeTime--closure issues, system speed decreasedclosure issues, system speed decreased
Aggravated thermal issuesAggravated thermal issues HotHot--spots, nonspots, non--uniform thermal gradientsuniform thermal gradients
[Sun, 1.8 GHz
[Sun, Niagara BroadbandP ]
High chances of thermal
Sparc v9 Microproc]
Processor] of thermal wear-outs
and very short
[Santarini, EDN, March ‘05]
lifetimes!
5
[Coskun et al ’07, Sun]
Thermal Issues Become More Critical Thermal Issues Become More Critical for 3Dfor 3D--MPSoCsMPSoCsfor 3Dfor 3D--MPSoCsMPSoCs
/ PE PE PE I/O
I/O Pherip
I/0
I/0
PE PE PE
SRAM SRAM
I/OPERI
SD
RA
M
3D Integ.PEs layer
SDRAM
I/O Pherip. I/0
PE
SRAM SRAM IPHER
M m
ain mPE
SRAMPEs layer
SRAM SRAM
I/O
RALS
mem
ory
More power and more non uniform heat spreading!6
More power and more non-uniform heat spreading!
Reliability Degradation Factors in Reliability Degradation Factors in MPSoCsMPSoCs
7
Reliability Degradation Factors in Reliability Degradation Factors in MPSoCsMPSoCs
Thermal Hot Spots
8
Reliability Degradation Factors in Reliability Degradation Factors in MPSoCsMPSoCs
9
Reliability Degradation Factors in Reliability Degradation Factors in MPSoCsMPSoCs
80
90
∆T Fatigue failures
70
T (C)increase with:• Magnitude of variation
601 31 61 91 121 151 181 211 241 271
Time (sec)
variation• Frequency of cycles
Caused by: • Power on/ off• Power management (turning off cores)
10
Reliability Degradation Factors in Reliability Degradation Factors in MPSoCsMPSoCs
11
Reliability Degradation Factors in Reliability Degradation Factors in MPSoCsMPSoCs
Spatial GradientsSpatial Gradients
12
Advocating ThermalAdvocating Thermal--Aware Aware 2D/3D2D/3D MPSoCMPSoC DesignDesign2D/3D 2D/3D MPSoCMPSoC DesignDesign
Integration of HW/SW modeling and management
Heat Flow Models
Integration of HW/SW modeling and management
Fast Thermal Exploration
HW Tuning knobs
HW Thermal monitoring knobsmonitoring
HW Based Thermal Management PoliciesHW-Based Thermal Management Policies
SW Based Thermal Management Policies
13
SW-Based Thermal Management Policies
OutlineOutline
Part 1:Part 1: Thermal Modeling and Management for Thermal Modeling and Management for 2D 2D MPSoCsMPSoCs
Part 2:Part 2: Thermal Modeling and Management for 3D Thermal Modeling and Management for 3D MPSoCsMPSoCs with Active Coolingwith Active Coolinggg
Acknowledgements:Acknowledgements:Acknowledgements: Acknowledgements: Prof. Ayse K. Coskun Prof. Ayse K. Coskun (Boston University and Sun (Boston University and Sun MicrosystMicrosyst.).),,Dr. Srinivasan MuraliDr. Srinivasan Murali ((iNoCsiNoCs and EPFL)and EPFL),,Dr. Srinivasan Murali Dr. Srinivasan Murali ((iNoCsiNoCs and EPFL)and EPFL),,Prof. Jose L. Ayala Prof. Jose L. Ayala (UCM)(UCM), , Thomas Brunschwiler and Dr. Bruno Michel Thomas Brunschwiler and Dr. Bruno Michel (IBM Zürich)(IBM Zürich), ,
14
Prof. Stephen Boyd Prof. Stephen Boyd (Stanford University)(Stanford University)
Thermal Modeling, Analysis and Thermal Modeling, Analysis and Management of 2D MultiManagement of 2D Multi ProcessorProcessorManagement of 2D MultiManagement of 2D Multi--Processor Processor
SystemSystem--onon--ChipChip
P f D id Ati AlP f D id Ati AlProf. David Atienza AlonsoProf. David Atienza AlonsoEmbedded Systems Laboratory (ESL) y y ( )Institute of EE, Faculty of Engineering
ARTIST ARTIST SummerSummer SchoolSchool 2010, 2010, AutransAutrans (France)(France)
OutlineOutline
MPSoCMPSoC thermal modeling and analysisthermal modeling and analysis HWHW based thermal management forbased thermal management for MPSoCsMPSoCs HWHW--based thermal management for based thermal management for MPSoCsMPSoCs SWSW--based thermal management for based thermal management for MPSoCsMPSoCs ConclusionsConclusions
2
OutlineOutline
MPSoCMPSoC thermal modeling and analysisthermal modeling and analysis HWHW based thermal management forbased thermal management for MPSoCsMPSoCs HWHW--based thermal management for based thermal management for MPSoCsMPSoCs SWSW--based thermal management for based thermal management for MPSoCsMPSoCs ConclusionsConclusions
3
MPSoCMPSoC ThermalThermal ModelingModeling ProblemProblem
Continuous heat flow analysis Capture geometrical characteristics
of MPSoCs Explore different packaging features
and heat sink characteristics
Time-variant heat sources Very complex Transistor switching depends on
MPSoC run-time activity (software)
Very complex computational
problem! Dynamic interaction with heat flow
analysis
4
MPSoCMPSoC ThermalThermal ModelingModelingStateState--ofof--thethe--ArtArtStateState ofof thethe ArtArt
MPSoCMPSoC ModelingModeling and and ExplorationExploration1 SW1 SW i l tii l ti T tiT ti ll t ( 100 KH )t ( 100 KH )1. SW 1. SW simulationsimulation: : TransactionsTransactions, c, cycleycle--accurate (~100 KHz) accurate (~100 KHz)
[Synopsys Synopsys RealviewRealview, Mentor , Mentor PrimecellPrimecell, Madsen et al., Angiolini et al.] , Madsen et al., Angiolini et al.]
At the desired cycleAt the desired cycle accurate level they are too slow foraccurate level they are too slow forAt the desired cycleAt the desired cycle--accurate level, they are too slow for accurate level, they are too slow for thermal analysis of realthermal analysis of real--life applications!life applications!
2 HW2 HW prototypingprototyping:: CoreCore dependentdependent (~50(~50 100 MHz)100 MHz)2. HW 2. HW prototypingprototyping: : CoreCore dependentdependent (~50(~50--100 MHz) 100 MHz) [[CadenceCadence PalladiumPalladium II,II, ARM Integrator IP, Heron Engineering]ARM Integrator IP, Heron Engineering]
Very expensive and late in design flow, no thermal modeling, Very expensive and late in design flow, no thermal modeling, only used for functional validation of only used for functional validation of MPSoCMPSoC architectures!architectures!
HeatHeat FlowFlow ModelingModeling::HeatHeat FlowFlow ModelingModeling::1. Software Software thermalthermal//powerpower modelsmodels [[SkadronSkadron et al., et al., KangKang et al.]et al.]
Too computationally intensive and not able to interact
5
Too computationally intensive and not able to interact at run-time with inputs from MPSoC components!
MPSoCMPSoC ThermalThermal ModelingModelingStateState--ofof--thethe--ArtArtStateState ofof thethe ArtArt
MPSoCMPSoC ModelingModeling and and ExplorationExploration1 SW1 SW i l tii l ti T tiT ti ll t ( 100 KH )t ( 100 KH )1. SW 1. SW simulationsimulation: : TransactionsTransactions, c, cycleycle--accurate (~100 KHz) accurate (~100 KHz)
[Synopsys Synopsys RealviewRealview, Mentor , Mentor PrimecellPrimecell, Madsen et al., Angiolini et al.] , Madsen et al., Angiolini et al.]
At the desired cycleAt the desired cycle accurate level they are too slow foraccurate level they are too slow for
2 HW2 HW prototypingprototyping:: CoreCore dependentdependent (~50(~50 100 MHz)100 MHz)
At the desired cycleAt the desired cycle--accurate level, they are too slow for accurate level, they are too slow for thermal analysis of realthermal analysis of real--life applications!life applications!
Combination of cycle-accurate MPSoC behavior 2. HW 2. HW prototypingprototyping: : CoreCore dependentdependent (~50(~50--100 MHz) 100 MHz) [[CadenceCadence PalladiumPalladium II,II, ARM Integrator IP, Heron Engineering]ARM Integrator IP, Heron Engineering]
Very expensive and late in design flow, no thermal modeling, Very expensive and late in design flow, no thermal modeling,
yand IC heat flow modeling at run-time is unheard of
HeatHeat FlowFlow ModelingModeling::
only used for functional validation of only used for functional validation of MPSoCMPSoC architectures!architectures!
HeatHeat FlowFlow ModelingModeling::1. Software Software thermalthermal//powerpower modelsmodels [[SkadronSkadron et al., et al., KangKang et al.]et al.]
Too computationally intensive and not able to interact
6
Too computationally intensive and not able to interact at run-time with inputs from MPSoC components!
OrthogonalizingOrthogonalizingMPSoCMPSoC ThermalThermal ModelingModeling andand AnalysisAnalysisMPSoCMPSoC ThermalThermal ModelingModeling and and AnalysisAnalysis
I/O CPUCPU
sniffersniffersniffer
SRAM SRAM
I/O
CPU
SRAM
CPU
CPUCPU
FPGAsniffer
sniffer
sniffer
sniffer
sniffer
sniffer
sniffer
sniffersniffer
I/O CPUCPU
sniffersniffersniffer
EnergyMPSoC Behavior
Emulation on FPGA
Temperature (T)Temperature (T)
SW SW thermalthermal estimationestimation tooltool(Host PC)(Host PC)
Framework: MPSoC behavioral model on reconfigurable HW
7
Framework: MPSoC behavioral model on reconfigurable HW interacting with efficient thermal estimation
Chip and Chip and PackagePackage HeatHeat FlowFlow ModelingModelingpp gg gg Model interfaceModel interface
Input:Input: powerpower modelmodel ofof MPSoCMPSoC componentscomponents,, geometricalgeometrical propertiesproperties Input: Input: powerpower modelmodel of of MPSoCMPSoC componentscomponents, , geometricalgeometrical propertiesproperties Output: temperature of Output: temperature of MPSoCMPSoC components at runcomponents at run--timetime
Thermal circuit: 1Thermal circuit: 1stst order RC circuitorder RC circuitThermal circuit: 1Thermal circuit: 1 order RC circuitorder RC circuit HeatHeat flowflow ~ ~ ElectricalElectrical currentcurrent ; ; Temperature ~ VoltageTemperature ~ Voltage Heat spreader and IC composed of elementary blocksHeat spreader and IC composed of elementary blocks160
/mºK
) Si thermal conductivity depends on temperature(IMEC & Freescale, 90nm)
Cu cucu cu cu 130
140
150
duct
ivity
(W/
Actual value
(IMEC & Freescale, 90nm)
si sisi
sisi
sisi
sisi
Th l d t t i 90100
110
120
herm
alco
nd
C tk = - G (tk)tk+ pk ; k = 1 m
Thermal capacitance matrixThermal conductance matrixCsi,1
Csi 2-G1,2 G1,2
-G2 1 G2 1.90
27 47 67 87 107 127
Th
Temperature (in Celsius)
8
C tk G (tk)tk+ pk ; k 1..m Temperature change
Temperature vector at instant kpower consumption vectorCcu,n
si,2G2,1 G2,1
SW SW ThermalThermal EstimationEstimation ToolTool forfor MPSoCsMPSoCsC C ttkk = = -- G (G (ttkk)t)t kk+ + ppkk ; k = 1..m ; k = 1..m
.
Creating linear approximation while retaining variable Creating linear approximation while retaining variable Si thermal conductivity:Si thermal conductivity: Si Si thermalthermal conductivityconductivity linearlylinearly approxapprox. : . : GGi,ji,j ((ttkk)) = I + q = I + q ttkk Numerically integrating in discrete Numerically integrating in discrete
ti d i thti d i th tt )
Si thermal conductivity dependent on temperature
.time domain the time domain the ttkk ::tk+1 = A(tk)t k+ Bpk ; k = 1..m A(t ) 1 1Complexity scales linearly with
140
150
160
vity
(W/m
ºK
Actual valueLinear fit
dependent on temperature
14001600
tion
60 sec of MPSoC heat flow analysis
A(tk) = (I - dtC-1G(tk)) ; B = dtC-1
Time step chosen small enough for convergence
Complexity scales linearly with the number of modeled cells (simulated on P4@ 3GHz)
110
120
130
mal
cond
ucti
thermal library validated 600800
100012001400
w e
stim
atm
e (S
.) Non-linear thermal estim. Proposed
linear thermal enough for convergence90
100
27 47 67 87 107 127
Ther
mTemperature (in Celsius)
thermal library validated against 3D finite element model (IMEC & Freescale) 0
200400600
0 2000 4000 6000 8000 10000Hea
t flo
wTi
m estimation
9
0 2000 4000 6000 8000 10000
Number of Cells
Case Case StudyStudy: HW 4: HW 4--Core Core MPSoCMPSoCyy
MPSoCMPSoC Philips Philips boardboard designdesign: : pp gg 4 4 processorsprocessors, DVFS: 100/500 MHz, DVFS: 100/500 MHz PlasticPlastic packagingpackaging
Software: Software: ImageImage watermarkingwatermarking, video , video renderingrendering
PowerPower valuesvalues forfor 90nm:90nm: PowerPower valuesvalues forfor 90nm:90nm:ElementElement Max Max PowerPower
((mWmW) ) 100 MH100 MH
Max Max PowerPower((mWmW) )
500 MH500 MH100 MHz100 MHz 500 MHz500 MHzProcessorProcessor 2,92 x 102,92 x 1022 1,02 x 101,02 x 1033
DD--CacheCache 1,42 x 101,42 x 1022 7,10 x 107,10 x 1022DD Cache Cache 1,42 x 101,42 x 10 7,10 x 107,10 x 10II--CacheCache 1,42 x 101,42 x 1022 7,10 x 107,10 x 1022
PrivPriv MemMem 0,61 x 100,61 x 1022 2,75 x 102,75 x 1022
10
AMBAAMBA 0,31 x 100,31 x 1022 0,68 x 100,68 x 1022
ResultsResults: : ThermalThermal ValidationValidation 44--core core PhilipsPhilips MPSoCMPSoCPhilips Philips MPSoCMPSoC
MPARM: MPARM: CycleCycle--accurateaccurate SW SW architecturalarchitectural simulatorsimulator CompleteComplete powerpower//thermalthermal modelsmodels tunedtuned toto Philips/IMEC figuresPhilips/IMEC figures Complete Complete powerpower//thermalthermal modelsmodels tunedtuned toto Philips/IMEC figures Philips/IMEC figures SimulationsSimulations tootoo slowslow: 2 : 2 daysdays forfor 0.18 real 0.18 real secsec (12 (12 cellscells) )
HW HW thermalthermal emulationemulation ableable toto validatevalidate policiespolicies at at runrun--timetimeMany weeks of
Average temperature of emulated 4-core MPSoCAverage temperature of emulated 4-core MPSoC
DynamicDynamic VoltageVoltage and and FrequencyFrequency ScalingScaling (DVFS(DVFS) ) basedbased onon thresholdsthresholdsy
simulation?!Emulation time 45 sec (128 cells)!
Very fast validation of MPSoC
400
420
in)
Average temperature of emulated 4 core MPSoC
400
420
n)
Average temperature of emulated 4 core MPSoC
DVFS ON:
500/100 MH500 MHz.
Very fast validation of MPSoC run-time thermal behavior and management
340
360
380
pera
ture
(Kel
vi
340
360
380
pera
ture
(Kel
vi500/100 MHz.
100 MH
Package limit (~85ºC)
300
320
340
0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0
Tem
p
300
320
340
0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0
Tem
p 100 MHz.
11
0,0 1,0 2,0 3,0 4,0 5,0 6,0 7,0 8,0Time (seconds)
Simulation in MPARM Emulation
0,0 1,0 2,0 3,0 4,0 5,0 6,0 7,0 8,0Time (seconds)
Simulation in MPARM Emulation Emulation with DFS
OutlineOutline
MPSoCMPSoC thermal modeling and analysisthermal modeling and analysis HWHW based thermal management forbased thermal management for MPSoCsMPSoCs HWHW--based thermal management for based thermal management for MPSoCsMPSoCs SWSW--based thermal management for based thermal management for MPSoCsMPSoCs ConclusionsConclusions
12
Temperature Management is Power Temperature Management is Power Control under Thermal ConstraintsControl under Thermal Constraints
Power consumption of coresPower consumption of cores
Control under Thermal ConstraintsControl under Thermal Constraints
Power consumption of cores Power consumption of cores determines thermal behaviordetermines thermal behavior Power consumption depends on Power consumption depends on
frequency and voltagefrequency and voltagefrequency and voltagefrequency and voltage Setting frequencies/voltages can control Setting frequencies/voltages can control
power and temperaturepower and temperature
Optimization problem: Optimization problem: frequency/voltage assignment in frequency/voltage assignment in MPSoCs under thermal constraintsMPSoCs under thermal constraintsMPSoCs under thermal constraintsMPSoCs under thermal constraints MeetMeet processingprocessing requirementsrequirements RespectRespect thermalthermal constraintconstraint atat allall timestimesRespectRespect thermalthermal constraintconstraint at at allall timestimes MinimizeMinimize powerpower consumptionconsumption
13
HWHW--Based Thermal Management Based Thermal Management StateState--ofof--thethe--ArtArtStateState--ofof--thethe--ArtArt
Static approach: thermalStatic approach: thermal--aware placement to try to even out aware placement to try to even out pppp p yp yworstworst--case thermal profile case thermal profile [[SapatnekarSapatnekar, Wong et al.] , Wong et al.] Computationally difficult problem (NPComputationally difficult problem (NP--complete)complete)Not able to predict all working conditions, and leakage changing Not able to predict all working conditions, and leakage changing dynamically, it is not useful in real systemsdynamically, it is not useful in real systems
No formalization of the thermal optimization problem Dynamic approach: HWDynamic approach: HW--based dynamic thermal managementbased dynamic thermal management
Clock gating based on timeClock gating based on time--out out [[XieXie et al., Brooks et al.]et al., Brooks et al.]
p p
g gg g [[ , ], ] DVFS based on thresholds DVFS based on thresholds [[ChaparroChaparro et al, et al, MukherjeeMukherjee et al,]et al,] Heuristics for component shut down, limited history Heuristics for component shut down, limited history [Donald et al][Donald et al]
Techniques to minimize power, they only achieve thermal Techniques to minimize power, they only achieve thermal management as a bymanagement as a by--productproduct
14
Formalization of Thermal Formalization of Thermal Management Problem inManagement Problem in MPSoCsMPSoCs
Control theory problemControl theory problemOb bl G t i l ti d b h iOb bl G t i l ti d b h i
Management Problem in Management Problem in MPSoCsMPSoCs
Optimal frequency assignment module, 2-phase approach: Observable: Geometrical properties and behavior Observable: Geometrical properties and behavior
Heat flow model and thermal profile estimationHeat flow model and thermal profile estimation Performance countersPerformance counters
Optimal frequency assignment module, 2 phase approach:
1) Design-time phase: Find optimal sets of frequenciesfor the cores for different working conditionsPerformance countersPerformance counters
Controlable: Max. throughput under thermal constraintsControlable: Max. throughput under thermal constraints Tuning knobs: frequencies/voltages of the system (DVFS)Tuning knobs: frequencies/voltages of the system (DVFS)
2) Run-time phase: Apply one of the predefined sets found in phase 1 for the required system performance
Optimal Run-time HW
Observed system: MPSoC Observer and control systemControl output: p
frequency assignment
module
Run time HW DVFS support Cores freqs
Performance counters( f )
Requirements:Max. Throughputmodule(average frequency)
Thermal profile
Thermal ThermalProcessor
Constraints:Max. temperature
15
profile estimationstatesensorscores
ProPro--Active HWActive HW--Based Thermal Control:Based Thermal Control:Phase 1Phase 1 –– DesignDesign--TimeTime
Predictive model of thermal behavior given a set of Predictive model of thermal behavior given a set of frequency assignmentsfrequency assignments
Phase 1 Phase 1 DesignDesign TimeTime
frequency assignments frequency assignments Chip
floorplan
Allowed corepower values and
f i
Packaging,heat spreader
i f tiPhase inputs
pfrequencies information
Optimization problem: Minimize sum of power Minimize sum of power Optimization problem:
Constraints:P f t i t f i f
e su o po ee su o po econsumption of coresconsumption of cores
Optimal Non-linear offline problem
Table of cores
frequencies
Method
outputs
Performance constraint: on average, freq. is favgp
frequency assignmentThermal equation: Si conductivity depends on tempThermal equationfrequencies assignments
outputs
Power equation: quadratic dependence on freq.
Meet temp. constraints at all time pointsmodulePower equation based on frequency
16Frequency in predefined range
q q y
Making Power Making Power and Thermal Constraints and Thermal Constraints ConvexConvexConvexConvex
Power constraint adaptationPower constraint adaptation Change nonChange non--affine (quadratic equality):affine (quadratic equality):
ppmaxmax ((ffi,ki,k))22 / (/ (ffmaxmax))2 2 = = ppi,ki,k ; i = 1,..,n, ; i = 1,..,n, ∀k∀kSolve convex problem and get table of optimal frequencies for different working conditions in ,, ,,
To To convex inequalityconvex inequality::pp ((ff ))22 / (/ (ff ))2 2 ≤≤ pp ; i = 1 n; i = 1 n ∀k∀k
q gpolynomial time (number of processors)
ppmaxmax ((ffi,ki,k)) / (/ (ffmaxmax)) ≤ ≤ ppi,ki,k ; i = 1,..,n, ; i = 1,..,n, ∀k∀k
Thermal constraint adaptationThermal constraint adaptation Thermal constraint adaptationThermal constraint adaptation Use worst case thermal Use worst case thermal
conductivity in the range ofconductivity in the range ofconductivity in the range of conductivity in the range of allowed temperatures, and allowed temperatures, and iterate (if needed) to optimumiterate (if needed) to optimum
17
ProPro--Active HWActive HW--Based Thermal Control:Based Thermal Control:Phase 2Phase 2 -- RunRun--Time Putting It All TogetherTime Putting It All Together
Use table of frequencies assignments and index by Use table of frequencies assignments and index by l di i ll di i l i i li i l
Phase 2 Phase 2 RunRun Time, Putting It All TogetherTime, Putting It All Together
actual conditions at regular runactual conditions at regular run--time intervalstime intervals
Current temperature of cores Targeted operating frequency of cores pfrequency of cores
Method inputs
Run-time optimal DVFS assignment HW module
1) Index table output of phase 1 with current working conditions
Run-time DVFS
Phase
g
with current working conditions
2) Compare to current assignment to cores andgenerate required signaling to modify DVFS values
DVFSchanges forprocessors
output
18
generate required signaling to modify DVFS values
Case Study: 8Case Study: 8--Core Sun Core Sun MPSoCMPSoCyy
MPSoCMPSoC Sun Niagara architectureSun Niagara architecture MPSoCMPSoC Sun Niagara architectureSun Niagara architecture 8 processing cores SPARC T18 processing cores SPARC T1
Max. frequency each core: 1 GHz Max. frequency each core: 1 GHz 10 DVFS values, applied every 100ms10 DVFS values, applied every 100ms
Max. Max. power power per core: 4 Wper core: 4 WExecution characteristicsExecution characteristics ofof Execution characteristics Execution characteristics of of workloads [Sun Microsystems]: workloads [Sun Microsystems]: Mixes Mixes of 10 different benchmarks, of 10 different benchmarks,
from webfrom web--accessing toaccessing to multimediamultimediafrom webfrom web accessing to accessing to multimediamultimedia 60,000 iterations of basic 60,000 iterations of basic
benchmarks, tens benchmarks, tens of seconds of of seconds of actual system executionactual system execution
Sun’s Niagara MPSoC
19
Results: Thermal Constraints RespectedResults: Thermal Constraints Respected
Total
DVFS:
run-time of benchmarks
180 sec
Proposed method achieves better throughput than
106 sec
Proposed method achieves better throughput than standard DVFS while satisfying thermal constraints
2-phase Convex method:
106 sec(45% less exec time)method: exec. time)
20
OutlineOutline
MPSoCMPSoC thermal modeling and analysisthermal modeling and analysis HWHW based thermal management forbased thermal management for MPSoCsMPSoCs HWHW--based thermal management for based thermal management for MPSoCsMPSoCs SWSW--based thermal management for based thermal management for MPSoCsMPSoCs ConclusionsConclusions
21
MPSoCMPSoC SystemSystem--Level Architecture: Level Architecture: HW and SW LayersHW and SW LayersHW and SW LayersHW and SW Layers
MPOS To Core #1(30% load)
Task Manager(30% load)
To Core #2(60% load)
To Core #1(70% load)
LOADFREQUENCY
100 %
TASK MIGRATION
TASK B
LOADFREQUENCY
To Core #1(70% load) To Core #2
(35% load)Can we control the MPSoC thermal profile by
To Core #2(3 % )
(70% load)
TASK AFSE LOAD
40%
TASK CFSE LOAD
40%
50 %
FSE LOAD50%
To Core #1(30% load)
(35% load)
To Core #2(60% load)
controlling software execution?
SW layers introduced to better exploit the HW of MPSoCsSW layers introduced to better exploit the HW of MPSoCs
(35% load)% 40%
0 %
PROC 1 PROC 2(30% load)
SW layers introduced to better exploit the HW of MPSoCsSW layers introduced to better exploit the HW of MPSoCs Applications divided in Applications divided in taskstasks: blocks of operations to be executed: blocks of operations to be executed MultiMulti--processor Operating System (MPOS) distributes the tasksprocessor Operating System (MPOS) distributes the tasks
L d b l iL d b l i l di t ib ti f k b tl di t ib ti f k b t
22
Load balancing:Load balancing: equal distribution of work between processorsequal distribution of work between processors
Task Migration for Task Migration for Load vs Thermal BalancingLoad vs Thermal BalancingLoad vs. Thermal BalancingLoad vs. Thermal Balancing
Plain load balancingPlain load balancing Plain load balancingPlain load balancing
No improvement in workload distribution possible:
LOADFREQUENCY100 %
no migrationQ
TASK BFSE LOADTEMPERATURETEMPERATURE
TASK A TASK C
50 % 50%PROCESSOR 1 TEMP.
MEAN TEMP.
TEMPERATUREHot-spot! 40%
FSE LOAD40%
FSE LOAD40%
0 %TIME
PROCESSOR 2 TEMP.
TIME
23PROC 1 PROC 2
TIME
Task Migration for Task Migration for Load vs Thermal BalancingLoad vs Thermal BalancingLoad vs. Thermal BalancingLoad vs. Thermal Balancing
Heat&RunHeat&Run: Load balancing with local knowledge of: Load balancing with local knowledge of Heat&RunHeat&Run: Load balancing with local knowledge of : Load balancing with local knowledge of temperature in temperature in MPSoCMPSoC componentscomponents
LOADFREQUENCY100 %
LOADFREQUENCY
TASK MIGRATIONQ
TASK BFSE LOAD
100 %
TASK BTEMPERATURE
TASK A TASK C
50 % 50%40%
TASK A TASK C
50 %
FSE LOAD50%
TEMPERATURE
40%
FSE LOAD40%
FSE LOAD40%
0 %
TASK AFSE LOAD
40%
TASK CFSE LOAD
40%0 %
TIME
24PROC 1 PROC 2PROC 1 PROC 2
TIME
source target
Task Migration for Task Migration for Load vs Thermal BalancingLoad vs Thermal BalancingLoad vs. Thermal BalancingLoad vs. Thermal Balancing
Heat&RunHeat&Run: Load balancing with local knowledge of: Load balancing with local knowledge of Heat&RunHeat&Run: Load balancing with local knowledge of : Load balancing with local knowledge of temperature in temperature in MPSoCMPSoC componentscomponents Helping with hotHelping with hot--spots, but no thermal balancingspots, but no thermal balancing
LOADFREQUENCY
TASK MIGRATIONExisting approaches do not consider
TEMPERATURE
100 %
TASK B
Existing approaches do not consider global thermal dynamics for task migration
TEMPERATURE
TASK A TASK C
50 %
FSE LOAD50%40%
TIME
TASK AFSE LOAD
40%
TASK CFSE LOAD
40%0 %
25
TIMEPROC 1 PROC 2source target
Task Migration for Task Migration for LoadLoad vsvs Thermal BalancingThermal Balancing
Contribution: Migration strategy for thermal balancingContribution: Migration strategy for thermal balancing Global knowledge of temperature at MPOS levelGlobal knowledge of temperature at MPOS level
Load Load vsvs Thermal BalancingThermal Balancing Global knowledge of temperature at MPOS levelGlobal knowledge of temperature at MPOS level Adjusted to particular thermal dynamics of each platformAdjusted to particular thermal dynamics of each platform
FormalizationFormalization FormalizationFormalization Dynamic number of tasks, no control theory formalization possibleDynamic number of tasks, no control theory formalization possible Knapsack problem, move N largest tasks between cores: estimated Knapsack problem, move N largest tasks between cores: estimated
TEMPERATURETEMPERATURE
p p gp p gincrease in temperature and minimizing performance penalty increase in temperature and minimizing performance penalty
TEMPERATURE
UPPER TRESHOLD
TIME TIME
LOWER TRESHOLD
Reduces hot spots and reaches thermal balancing26
TIMEReduces hot-spots and reaches thermal balancing
Case Case StudyStudy: : FreescaleFreescale MPSoCMPSoC BoardBoard
HardwareHardware
yy
HardwareHardware 3 RISC processor cores3 RISC processor cores 16KB caches, 32KB shared16KB caches, 32KB shared memmem..16KB caches, 32KB shared 16KB caches, 32KB shared memmem.. AMBA bus, 2GB ext. AMBA bus, 2GB ext. memmem
SoftwareSoftware SoftwareSoftware uCLinuxuCLinux--based MPOSbased MPOS Multimedia applications: audio and video Multimedia applications: audio and video pppp
Two packaging optionsTwo packaging options MobileMobile embeddedembedded SoCsSoCs (slow temperature(slow temperature variationsvariations)) Mobile Mobile embeddedembedded SoCsSoCs (slow temperature (slow temperature variationsvariations)) High performance High performance SoCsSoCs (fast temperature (fast temperature variationsvariations))
27
Results and Comparisons Results and Comparisons pp
Good thermal balancingGood thermal balancing ~1 2ms @ 400MHz (1% overhead) Good thermal balancingGood thermal balancing Average: 40.5ºC, Average: 40.5ºC,
variations of < 3ºC variations of < 3ºC
~1.2ms @ 400MHz (1% overhead)
Small performance overhead Small performance overhead ( 2 ( 2 migratmigrat/s)/s) +/-3º
Comparisons with Comparisons with other policiesother policies
L d b l iL d b l i
Good performance and uniform temperature adjusting globally to thermal dynamics with MPOS Load balancing Load balancing inefficient inefficient (>7ºC (>7ºC diffsdiffs))
Heat&RunHeat&Run inefficient or causes inefficient or causes many deadline many deadline
adjusting globally to thermal dynamics with MPOS
yymisses (40% below performance requirements) misses (40% below performance requirements)
Contribution: Contribution: performance requirements met performance requirements met for both types of packagingfor both types of packaging
28
for both types of packaging for both types of packaging
Adapt2DAdapt2D--MIGRA: MIGRA: CombinationCombination of HW and SWof HW and SW--BasedBasedProPro--ActiveActive ThermalThermal ManagementManagementProPro Active Active ThermalThermal ManagementManagement
Initial: Large gradients New: Thermal balancingg g g
HWHW basedbased managementmanagement:: ConvexConvex basedbased dynamicdynamic voltagevoltage andand HWHW--basedbased managementmanagement: : ConvexConvex--basedbased dynamicdynamic voltagevoltage and and frequencyfrequency scalingscaling (DVFS) (DVFS) explorationexploration
SWSW--basedbased managementmanagement: : ProactiveProactive tasktask schedulingscheduling and and migrationmigration SupportSupport of of multimulti--processorprocessor operatingoperating systemsystem: : SolarisSolaris MultiMulti--CoreCore
Good thermal control in commercial MPSoCs in
29
Good t e a co t o co e c a SoCs90nm, what about 3D integration?
OutlineOutline
MPSoCMPSoC thermal modeling and analysisthermal modeling and analysis HWHW based thermal management forbased thermal management for MPSoCsMPSoCs HWHW--based thermal management for based thermal management for MPSoCsMPSoCs SWSW--based thermal management for based thermal management for MPSoCsMPSoCs ConclusionsConclusions
30
ConclusionsConclusions Progress in semiconductor technologies enables new Progress in semiconductor technologies enables new MPSoCsMPSoCs
Thermal/reliability issues must be addressed for safe human interactionThermal/reliability issues must be addressed for safe human interaction Thermal/reliability issues must be addressed for safe human interactionThermal/reliability issues must be addressed for safe human interaction Thermal monitoring and control are keyThermal monitoring and control are key
Cl b fit f th lCl b fit f th l d i th d fd i th d f MPS CMPS C Clear benefits of thermalClear benefits of thermal--aware design methods for aware design methods for MPSoCsMPSoCs Novel, fast and lowNovel, fast and low--cost thermal modeling approach at systemcost thermal modeling approach at system--levellevel Formalization of HWFormalization of HW--based thermal management problem as convexbased thermal management problem as convexFormalization of HWFormalization of HW based thermal management problem as convex, based thermal management problem as convex,
and solved in polynomial timeand solved in polynomial time New SWNew SW--based thermal balancing method with very limited overheadbased thermal balancing method with very limited overhead
Validation on commercial 2DValidation on commercial 2D-- MPSoCsMPSoCs (Sun, (Sun, FreescaleFreescale, Philips), Philips) Fast exploration of thermal behavior of complexFast exploration of thermal behavior of complex MPSoCsMPSoCsFast exploration of thermal behavior of complex Fast exploration of thermal behavior of complex MPSoCsMPSoCs Effective HWEffective HW-- and SWand SW--based probased pro--active thermal managementactive thermal management
31
Key References and BibliographyKey References and Bibliography Thermal modeling and FPGAThermal modeling and FPGA--based emulationbased emulation
““HWHW--SW Emulation Framework for TemperatureSW Emulation Framework for Temperature--Aware Design inAware Design in HWHW--SW Emulation Framework for TemperatureSW Emulation Framework for Temperature--Aware Design in Aware Design in MPSoCsMPSoCs”, D. Atienza, et al. ”, D. Atienza, et al. ACM TODAESACM TODAES, Vol. 12, Nr. 3, pp. 1, Vol. 12, Nr. 3, pp. 1––26, 26, August 2007. August 2007.
Thermal management for 2D Thermal management for 2D MPSoCsMPSoCs ““ThermalThermal BalancingBalancing PolicyPolicy forfor MultiprocessorMultiprocessor StreamStream Computing Computing
PlatformsPlatforms”, F. Mulas, et al.,”, F. Mulas, et al., IEEE TIEEE T--CADCAD, Vol. 28, , Vol. 28, Nr.Nr. 12, pp. 187012, pp. 1870--1882, 1882, DecemberDecember 2009.2009.
““ProcessorProcessor SpeedSpeed ControlControl withwith ThermalThermal ConstraintsConstraints” A Mutapcic” A Mutapcic ProcessorProcessor SpeedSpeed Control Control withwith ThermalThermal ConstraintsConstraints”, A. Mutapcic, ”, A. Mutapcic, S. Boyd, et al. S. Boyd, et al. IEEE TCASIEEE TCAS--II, Vol. 56, , Vol. 56, Nr.Nr. 9, pp. 19949, pp. 1994--2008, 2008, SeptSept 2009. 2009.
““Inducing ThermalInducing Thermal--Awareness in MultiAwareness in Multi--Processor SystemsProcessor Systems--onon--ChipChipInducing ThermalInducing Thermal Awareness in MultiAwareness in Multi Processor SystemsProcessor Systems onon Chip Chip Using NetworksUsing Networks--onon--ChipChip”, E. Martinez, et al., Proc. ”, E. Martinez, et al., Proc. ISVLSIISVLSI 2009. 2009.
““Temperature Control of HighTemperature Control of High--Performance MultiPerformance Multi--core Platforms core Platforms
32
Using Convex OptimizationUsing Convex Optimization”, ”, S.MuraliS.Murali, et al., et al., Proc. , Proc. DATE, DATE, 2008.2008.
QUESTIONS ?Swiss National
Science Foundation
QUESTIONS ?European
Acknowledgements:European
Commission
33UCSD / Sun Microsystems IMEC / PhilipsBologna / Freescale
semiconductorsIBM Zürich
Thermal Modeling and ManagementThermal Modeling and Management for 3D MPSoCs with Active Cooling
P f D id Ati AlProf. David Atienza AlonsoEmbedded Systems Laboratory (ESL) y y ( )Institute of EE, Faculty of Engineering
© ESL/EPFL 2010
ARTIST Summer School 2010, Autrans (France)
Advantages of 3D vs. 2D Chips
Promises • Reduce average length of on-chip global wires• Reduce average length of on-chip global wires• Increase number of devices
reachable in given time budget
• Greatly facilitate heterogeneous integration (e.g. logic-DRAM stacks)(e.g. logic DRAM stacks)
© ESL/EPFL 20102
Samsung Wafer Stack Package (WSP) memory
[Figures: Ray Yarema, Fermilab]
Thermal-Reliability Issues in 3D Chips
Latest chips increase power densityp p y Non-uniform hot-spots in 2D chips [Sun, 1.8 GHz
Sparc v9 In 3D chips heat affects several pMicroproc]
In 3D chips, heat affects several layers! (even more “cool” components)
[Sun, Courtesy: [ ,Niagara
BroadbandProcessor]
[IBM and
Irvine Sens.]
© ESL/EPFL 2010 3
Thermal-Reliability Issues in 3D Chips
Latest chips increase power densityp p y Non-uniform hot-spots in 2D chips [Sun, 1.8 GHz
Sparc v9 In 3D chips heat affects several pMicroproc]
In 3D chips, heat affects several layers! (even more “cool” components)
[Sun, Courtesy:Higher chances
of thermal [ ,Niagara
BroadbandProcessor]
[IBM and
Irvine Sens.]
wear-outs and
hvery short lifetimes!
© ESL/EPFL 2010 4
Run-Time Heat Spreading in 3D Chips
8000
9000
10000 Layer 2
420 5-tier 3D stack: 10 heat sources and sensors
2000
3000
4000
5000
6000
7000
8000
wid
th (u
m)
400
405
410
415
Inject between 4W – 1.5W
0 2000 4000 6000 8000 100000
1000
length (um)
7000
8000
9000
10000 Layer 3
404
406 2nd Tier
1000
2000
3000
4000
5000
6000
7000
wid
th (u
m)
396
398
400
402
0 2000 4000 6000 8000 100000
length (um)
394
5000
6000
7000
8000
9000
10000 Layer 4
dth
(um
)
397
398
399
400
401
3rd Tier5th Tier
0 2000 4000 6000 8000 100000
1000
2000
3000
4000
length (um)
wid
393
394
395
396
5000
6000
7000
8000
9000
10000 Layer 5
h (u
m)
394
395
396
397
Large and non-uniform
© ESL/EPFL 20100 2000 4000 6000 8000 10000
0
1000
2000
3000
4000
length (um)
wid
th
390
391
392
393 4th Tier heat propagation! (up to 130º C on top tier) 5
NanoTera CMOSAIC Project: Design of 3D MPSoCs with Advanced Cooling3D MPSoCs with Advanced Cooling
3D systems require novel electro-thermal co-design• Academic partners: EPFL and ETHZ• Academic partners: EPFL and ETHZ• Industrial: IBM Zürich
© ESL/EPFL 20106
NanoTera CMOSAIC Project: Design of 3D MPSoCs with Advanced Cooling3D MPSoCs with Advanced Cooling
3D systems require novel electro-thermal co-design• Academic partners: EPFL and ETHZ
3D stacked MPSoC chips: microchannels etched on
• Academic partners: EPFL and ETHZ• Industrial: IBM Zürich
pback side to circulate liquid coolant
S tSystem Level Active
C liadjustment Cooling Manager
adjustmentof coolant flux
(3D heat flow
task scheduling andexecution control
© ESL/EPFL 2010
prediction)7
Outline
IntroductionIntroduction 3D chip thermal modeling framework Validation of 3D thermal model Validation of 3D thermal model Liquid cooling modeling
Li id li d l lid ti Liquid cooling model validation Close-loop 3D MPSoCs thermal management with
ti liactive cooling Experiments and conclusions
© ESL/EPFL 2010 8
Compact RC-Based Tier Thermal Model
Gate-level thermal model qbTbq 6
qb_top qb_backRC Network of
qbTbq jf
ffjbj
01
q qb_rightqb_left
Si/metal layer cells qb_bottomqb_front
Convective boundary conditions
cells
qb_top = htopA(Ta-Ttop)
2D tier modeled as heat flux moving between adjacent cells
Convective boundary conditions between layers in tier
I•
(qbi)I
I+1•
(qbi)I+1f i
I-1•
© ESL/EPFL 20109
qb_bottom = hbottomA(Ta-Tbottom)face i
[Atienza et al., TODAES 2007]
Complete 3D Chip Thermal Modeling
Multi-level execution for thermal convergence in 3D • Local (2D-tier), liquid channels and global (3D) propagation
Evaluate local temperature for each cell
erge
nce
ns
vel c
onve
N it
erat
ion
Feedbacktemperature Update with neighbour
Tier
-levN temperature Upda e e g boutemperature difusion
Go to next tier or microchannel
© ESL/EPFL 201010[Ayala et al., NanoNet 2009]
3D Chip Thermal Library Validation
Extensible set of layers in 3D stackExtensible set of layers in 3D stack• up to 9 tiers and heat spreader• Pre-defined layers:
Sili (10 l ) l Silicon, copper (10 layers), glue, overmold, interposer, bump
Configurable nr. of cells and iterations per tier• Initially 10ms thermal interval (1000 iterat./tier)
Multi-tier test chip manufactured at EPFL:
© ESL/EPFL 2010 11
3D Chip Thermal Library Validation
Extensible set of layers in 3D stackExtensible set of layers in 3D stack• up to 9 tiers and heat spreader• Pre-defined layers:
Sili (10 l ) l Silicon, copper (10 layers), glue, overmold, interposer, bump
Configurable nr. of cells and iterations per tier• Initially 10ms thermal interval (1000 iterat./tier)
Multi-tier test chip manufactured at EPFL:• Three types of tiers• Three types of tiers
© ESL/EPFL 2010 12
3D Thermal Library Validation: Creating Various 3D Thermal MapsCreating Various 3D Thermal Maps
Flexibility for thermal characterizationy
© ESL/EPFL 2010 13
3D Thermal Library Validation: Creating Various 3D Thermal Maps
Flexibility for thermal characterization
Creating Various 3D Thermal Maps
y
© ESL/EPFL 2010 14
3D Thermal Library Validation: Creating Various 3D Thermal MapsCreating Various 3D Thermal Maps
Flexibility for thermal characterizationy
10 heat sources and sensors per layer, ibl t b i lt l ti t d
© ESL/EPFL 2010
accesible to be simultaneously activated
15
3D Thermal Library Validation: Correlation with 5-Tier 3D StackCorrelation with 5 Tier 3D Stack
3D Chi EPFL L 3 h t i ti
29303132
e (m
V)
3D Chip, EPFL, Layer 3 characterizationBlue Curve: 3D current -heat model for D8Pink curve: Heater current measured in D8
Dev8D7HD8S
242526272829
0 200 400 600 800 1000 1200
Sens
or V
olta
g
0 200 400 600 800 1000 1200Heater Current (mA), applied to Dev 7
3D Chip EPFL multi tier characterization
343638
mV)
3D Chip, EPFL, multi-tier characterizationBue/Pink Curve: D7 (tier 1) and D8 (tier 4)Red Curve: 3D current-heat model for D8
Dev6D 7
2628303234
Sens
or V
olta
ge ( Dev7
Div6_Iheat7
© ESL/EPFL 2010
240 200 400 600 800 1000 1200 1400
Heater Current (mA), applied to Dev 7
16[Ayala et al., Nano-Nets ’09]
3D Thermal Library Validation: Correlation with 5-Tier 3D StackCorrelation with 5 Tier 3D Stack
3D Chi EPFL L 3 h t i ti
29303132
e (m
V)
3D Chip, EPFL, Layer 3 characterizationBlue Curve: 3D current -heat model for D8Pink curve: Heater current measured in D8
Dev8D7HD8S
242526272829
0 200 400 600 800 1000 1200
Sens
or V
olta
g
0 200 400 600 800 1000 1200Heater Current (mA), applied to Dev 7
3D Chip EPFL multi tier characterization
Variations of less than 1.5% between 3D stack measurements and new 3D thermal model
343638
mV)
3D Chip, EPFL, multi-tier characterizationBue/Pink Curve: D7 (tier 1) and D8 (tier 4)Red Curve: 3D current-heat model for D8
Dev6D 7
2628303234
Sens
or V
olta
ge ( Dev7
Div6_Iheat7
© ESL/EPFL 2010
240 200 400 600 800 1000 1200 1400
Heater Current (mA), applied to Dev 7
17[Ayala et al., Nano-Nets ’09]
Modeling Through Silicon Vias (TSVs) in 3D Stacksin 3D Stacks
TSVFigure: LSM-EPFL
TSVs:• Size: 5-10um x 10-100um• TSVs change resistivity of interlayer material (IM)
Modeling Granularities:1. Homogeneous
distribution, one R ,value for the IM
2. Different R value per unit (core cache etc )unit (core, cache, etc.)
3. Exact locations of TSVs
Hi h Source: IBM Zürich and
© ESL/EPFL 2010
• Higher accuracy• Higher complexity
18
Source: IBM Zürich and Y.Heights
TSV Modeling Accuracy in 3D Stacks
Chosen to model TSV groups in localized positions
© ESL/EPFL 2010 19
Chosen to model TSV groups in localized positions of 3D MPSoCs
Liquid Flux Model for Laminar Flowq
Local junction temperature modeled as RC t knetwork:
Rtot = Rcond + Rconv + Rheat Thermal resist.
Heat source
Rtot = 1/(Gsi/t + 1/Rb) + A/(bAt) + A/(VPcp) of Si
Chip back-side temperatureSi b Heating Fl t d
Thermal
temperatureSi base thickness
Heating area Total area
Flow rate and density
Dependence of thermal resistance in liquid Thermal resistance of
wiringq2
Dependence of thermal resistance in liquid flux modeled as a quadratic form• Variable value of coolant flux (Φ)
T1
P P
Variable value of coolant flux (Φ)∆Rheat ≈ aΦ + bΦ2 ; b
3D Thermal Model with Liquid Cooling
New set of layers in 3D stack3D t k ( t 9 ti )• 3D stack (up to 9 tiers)
• 1 microchannel and coolant flow per tier
5 ti t k ith i h l d if ld 5-tier stack with microchannels and manifold cooling seal manufactured at IBM/EPFL• Enables different multi-tier liquid flux injection
Micro-HeaterLiquid
j
PCB Micro-Channels
© ESL/EPFL 2010 21
Source: IBM & ESL, EPFL
Manufacturing of 5-Tier 3D Test Chip with Liquid Channels in Multiple TiersLiquid Channels in Multiple Tiers
Front-sideBack-side
Figure: IBM & ESL, EPFL
Adding multi-tier liquid cooling in-/out-lets Multi-tier active cooling technology feasible for
© ESL/EPFL 2010 22
technology feasible for 3D-stacked chips
Correlation Results: Liquid Cooling and 3D Heat TransferLiquid Cooling and 3D Heat Transfer
Temperature evolution at the junction (Tj)
q 1q 2
P2
• Tested range: 0.015 to 0.15 L/min • Similar accuracy results at different channels
T 1 P1
Avg Max temp Error= 0.6%
© ESL/EPFL 2010 23[Atienza et al., THERMINIC ’09]
Correlation Results: Liquid Cooling and 3D Heat Transfer
q 1q 2
P2
Liquid Cooling and 3D Heat Transfer Temperature evolution at the junction (Tj)
qq• Tested range: 0.015 to 0.15 L/min • Similar accuracy results at different channels
T 1 P1Variations of less than 1% between measurements
and RC-based 3D thermal model with liquid cooling
Avg Max temp Error= 0.6%
© ESL/EPFL 2010 24[Atienza et al., THERMINIC ’09]
Complete 3D Chip Thermal Modeling Flow with Liquid Coolingwith Liquid Cooling
Inputs:Inputs:
• Workload information• Floorplan, TSV areas, package
( )
Inputs: • Workload information
• Activity of cores
Scheduler (Reactive Proactive)
• Temperature (for dynamic policies) Power Manager (DPM)
Inputs: • Power trace for each unitScheduler (Reactive, Proactive) Power trace for each unit
• Floorplan, package and die properties (Niagara-1), TSV area percentage/distribution
• Flow rate
3D Thermal Simulator w. Liquid Cooling based on EPFL-IBM 3D chips
(Integrated within internal HotSpot tool version)
Transient Temperature Response
© ESL/EPFL 2010
Transient Temperature Response for Each Unit
25
Run-Time HW/SW Thermal Modeling Framework for 3D Chips
Multi Proc OS + DVFS + Task Migration
Framework for 3D Chips Exploitation of both hardware and software benefits
I/O
SRAM SRAM
CPU
SRAM
CPUMulti-Proc. OS + DVFS + Task Migration
Sw app 1 ... Sw app NZeroZero--delaydelayMPS CMPS C
sniffer
sniffersniffer
iff
sniffer
SRAM SRAM
I/O
SRAM
CPUCPUMPSoCMPSoC
architecturearchitecturesimulationsimulation
Energy of 2D componentssniffer
sniffer
sniffersniffer
sniffersnifferMPSoC Behavior
Emulation on FPGA
simulationsimulation p
Temp. (T) of 2D components
standardstandard Ethernet Ethernet connectionconnection & & dedicateddedicated HW monitorHW monitorDetailedDetailed components
Software Thermal
si si si sicu cucucucu
thermalthermalanalysisanalysis of of 2D2D MPSoCMPSoC
© ESL/EPFL 2010
Model Host PC
si sisi
sisi
sisi
sisi2D 2D MPSoCMPSoC
layoutlayout26[D. Atienza et al., TODAES 2007]
Run-Time HW/SW Thermal Modeling Framework for 3D ChipsFramework for 3D Chips
Multi Proc OS + DVFS + Task Migration Exploitation of both hardware and software benefits
I/O
SRAM SRAM
CPU
SRAM
CPUMulti-Proc. OS + DVFS + Task Migration
Sw app 1 ... Sw app NZeroZero--delaydelayMPS CMPS C
sniffer
sniffersniffer
iff
sniffer
SRAM SRAM
I/O
SRAM
CPUCPUEnergy of 3D components
MPSoCMPSoCarchitecturearchitecturesimulationsimulation sniffer
sniffer
sniffersniffer
sniffersnifferMPSoC Behavior
Emulation on FPGA
componentssimulationsimulation
standardstandard Ethernet Ethernet connectionconnection & & dedicateddedicated HW monitorHW monitor
Temp. of3D components
3D StackThermal
Nth Tier
© ESL/EPFL 2010
Model1st Tier Host PC
[D. Atienza, THERMINIC 2009] 27
Thermal Management for 3D-MPSoCs with Liquid Coolingwith Liquid Cooling
Active-Adapt3D: Combined policy manager (B t P A d t IEEE/IFIP VLSI S C 2009)(Best-Paper Award at IEEE/IFIP VLSI-SoC 2009)• Predictive, floorplan-based task assignment and DVFS
Cl l i bl li id li t l• Close-loop variable liquid cooling control T ≥ 80°C Increment flow rate ; T < 80°C Decrement P li b li d ti l ti l Policy can be applied reactively or proactively
Thermal SensorsSystem Temperature
Flow Rate
Temperature Measurements
pDynamics
REACTIVE
ARMA‐Based
Flow Rate Tuner
Temperature
© ESL/EPFL 2010 28
Predictor Forecast
PROACTIVE
Adaptive Thermal-Aware Task Assignment Policy for 3D MPSoCsAssignment Policy for 3D MPSoCs
Cores on layers closer to the heat sink can be cooled faster in comparison to cores further away
Adapt-3D assigns a thermal index ( ) to each core in order to distinguish the location of the cores• Higher Core more prone to hot spotsi
Higher Core more prone to hot spotsi
For cores at locations 1, 2 and 3:
Chip 2 2
3
321
Chip‐1
Chip‐2 2
© ESL/EPFL 2010
Chip‐1
29[Coskun and Atienza, DATE ‘09]
Adaptive Thermal-Aware Task Assignment Policy for 3D MPSoCsAssignment Policy for 3D MPSoCs
WPP tt 1Probability of receiving workload at time t:
preferredavginitinc TTW if
1Weight:
Cool core
preferredavgiinitdec
preferredavgi
initinc
TTWW
if
Hot coreFor each core
TTWEmpirical avgpreferredinit TTW
Measured by sensors
pconstants
© ESL/EPFL 2010
E.g., 80oCMeasured by sensors
30
Experiments 3D Thermal Management: 3D MPSoCs with Microchannels
Target 3D systems based on 3D version Sun UltraSPARC T1 P l d kl d f l t d i S
3D MPSoCs with Microchannels
• Power values and workloads from real traces measured in Sun platforms (multimedia players, web servers, databases, etc.)
Cores and caches in separate layersCores and caches in separate layers
Channels:Width 400umWidth 400um, Depth 250um. Four flow rate
© ESL/EPFL 2010
settings, default at 15ml/min.
31(EXP1-2) (EXP3) (EXP4)
Thermal Management for 3D Chips: Active-Adapt3D ComparisonsActive Adapt3D Comparisons
Predictive task scheduling, active cooling and floorplan-Predictive task scheduling, active cooling and floorplanaware DVFS achieves less than 5% hotspots
© ESL/EPFL 2010
Promising figures for thermal control in 3D-MPSoCs32[Coskun and Atienza, DATE ‘10]
Thermal Management in 3D Chips: Active-Adapt3D ComparisonsActive Adapt3D Comparisons
Variable multi-tier flow control useful for 3D systems with 3+ layersVariable multi tier flow control useful for 3D systems with 3+ layers. Proactive thermal management achieves:
• 75% reduction in spatial gradients on average -- for fixed flow ratep g g• 97% reduction in spatial gradients on average -- for variable flow rate
*LC: Multi-tier variable liquid
cooling
© ESL/EPFL 2010 33
Cooling power savings up to 67% to worst-case flux [Coskun and Atienza, DATE ‘10]
Conclusions
Complexity of coming 3D MPSoC chips requires novelthermal modeling approachesthermal modeling approaches• Application of simple RC-based methods demonstrated,
validated with 3D test chip• Initial model of liquid cooling channels in 3D chips
Simple RC laminar flow model, works well with variable liquidfluxes (errors of less than 2%)fluxes (errors of less than 2%)
Integrated the compact model into custom HotSpot tool
New thermal management: feedback controller adjusts flow New thermal management: feedback controller adjusts flow rate to allowed temperature with job assignment and DVFS• Proactive control improves the hot spot reduction to 95% forProactive control improves the hot spot reduction to 95% for
systems with variable flow rates, and reduces thermal variations• Dynamic flow rate adjustment is helpful in reducing the energy cost
f th d ll t (67% i )
© ESL/EPFL 2010
of the pump and overall system (67% power savings)
34
Key References and Bibliography
3D Thermal modeling and FPGA-based emulation• “3D-ICE: Compact transient thermal model for 3D ICs with liquid cooling via
enhanced heat transfer cavity geometries”, A. Sridhar, et al. Proc. of ICCAD 2010,USA, November 2010.
• “Transient Thermal Modeling of 2D/3D Systems-on-Chip with Active Cooling”,David Atienza, Proc. of THERMINIC 2009, Belgium, October, 2009.
Thermal management for 3D MPSoCs• “Fuzzy Control for Enforcing Energy Efficiency in High-Performance 3D
Systems”, M. Sabry, Ayse K. Coskun, David Atienza, Proc. of ICCAD 2010, USA,November 2010.
• “Energy-Efficient Variable-Flow Liquid Cooling in 3D Stacked Architectures”,Ayse K. Coskun, David Atienza, et al., Proc. of DATE 2010, Germany, March 2010.
• “Modeling and Dynamic Management of 3D Multicore Systems with LiquidC C f S S C OCooling”, Ayse K. Coskun, et al., Proc. of VLSI-SoC 2009, Brazil, October 2009.(Best Paper Award)
• “Dynamic Thermal Management in 3D Multicore Architectures”, Ayse K. Coskun,f
© ESL/EPFL 2010
et al., Proc. of DATE 2009, France, April 2009.
Nano-Tera.ch Swiss Engineering Programme
European
QUESTIONS ?
pCommission
QUESTIONS ? Swiss National Science Foundation
© ESL/EPFL 2010