83
-1- ARTIST Summer School in Europe 2010 ARTIST Summer School in Europe 2010 Autrans (near Grenoble), France Autrans (near Grenoble), France September 5 September 5-10, 2010 10, 2010 Thermal Thermal- -Aware Design of Aware Design of Thermal Thermal- -Aware Design of Aware Design of Thermal Thermal Aware Design of Aware Design of 2D/3D Multi 2D/3D Multi- -Processor Processor Thermal Thermal Aware Design of Aware Design of 2D/3D Multi 2D/3D Multi- -Processor Processor System System- -on on- -Chip Chip Architectures Architectures System System- -on on- -Chip Chip Architectures Architectures I i dS k I i dS k D idAi D idAi I i dS k I i dS k D idAi D idAi Invited Speaker: Invited Speaker: David Atienza, David Atienza, Professor and Director of Embedded Systems Laboratory (ESL), EPFL Professor and Director of Embedded Systems Laboratory (ESL), EPFL Invited Speaker: Invited Speaker: David Atienza, David Atienza, Professor and Director of Embedded Systems Laboratory (ESL), EPFL Professor and Director of Embedded Systems Laboratory (ESL), EPFL http://www.artist-embedded.org/

Thermal-- Aware Design ofAware Design of 2D/3D Multi2D/3D ... · PfD idAti AlProf. David Atienza Alonso Embedded Systems Laboratoryy( ) (ESL) Institute of EE, Faculty of Engineering

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • - 1 -

    ARTIST Summer School in Europe 2010ARTIST Summer School in Europe 2010Autrans (near Grenoble), FranceAutrans (near Grenoble), France

    September 5September 5--10, 201010, 2010

    ThermalThermal--Aware Design ofAware Design ofThermalThermal--Aware Design ofAware Design ofThermalThermal Aware Design of Aware Design of

    2D/3D Multi2D/3D Multi--Processor Processor

    ThermalThermal Aware Design of Aware Design of

    2D/3D Multi2D/3D Multi--Processor Processor

    SystemSystem--onon--Chip Chip ArchitecturesArchitecturesSystemSystem--onon--Chip Chip ArchitecturesArchitectures

    I i d S kI i d S k D id A iD id A iI i d S kI i d S k D id A iD id A iInvited Speaker: Invited Speaker: David Atienza, David Atienza,

    Professor and Director of Embedded Systems Laboratory (ESL), EPFLProfessor and Director of Embedded Systems Laboratory (ESL), EPFL

    Invited Speaker: Invited Speaker: David Atienza, David Atienza,

    Professor and Director of Embedded Systems Laboratory (ESL), EPFLProfessor and Director of Embedded Systems Laboratory (ESL), EPFL

    http://www.artist-embedded.org/

  • Evolution of Electronics to Evolution of Electronics to MultiMulti--Processor SystemProcessor System--onon--Chip (Chip (MPSoCMPSoC))

    Roadmap continues: 90Roadmap continues: 90656545 nm45 nm

    MultiMulti Processor SystemProcessor System onon Chip (Chip (MPSoCMPSoC))

    CMOS Roadmap continues: 90Roadmap continues: 90656545 nm45 nm

    MultiMulti--Processor SystemProcessor System--onon--Chip Chip

    CMOS90nmCMOS

    65nm CMOS45nmyy pp

    (MPSoC) architectures(MPSoC) architectures

    I/0 PE PE PE I/OPER

    SD

    R

    PEI/0 SRAM SRAM

    RIPHE

    RA

    M m

    ain

    PE

    Local

    CPU

    PESRAM

    I/O

    ERALS

    n mem

    or

    LocalMemoryhierarchy

    i/o

    2

    I/O S

    ry

  • Evolution of Electronics to Evolution of Electronics to MultiMulti--Processor SystemProcessor System--onon--Chip (Chip (MPSoCMPSoC))

    Roadmap continues: 90Roadmap continues: 90656545 nm45 nm

    MultiMulti Processor SystemProcessor System onon Chip (Chip (MPSoCMPSoC))

    CMOS Roadmap continues: 90Roadmap continues: 90656545 nm45 nm

    MultiMulti--Processor SystemProcessor System--onon--Chip Chip

    CMOS90nmCMOS

    65nm CMOS45nmyy pp

    (MPSoC) architectures(MPSoC) architectures

    I/0 PE PE PE I/OPER

    SD

    R

    80-tile 1.28TFLOPS

    MPS C INTELI/0 SRAM SRAM

    RIPHE

    RA

    M m

    ain

    MPSoC INTEL [ISSCC ‘07]

    [Cell Multi-Processor – PS3]PE

    SRAM

    I/O

    ERALS

    n mem

    or

    [Cell Multi Processor PS3]

    3

    I/O S

    ry

  • MPSoCsMPSoCs are Spreading Fastare Spreading Fastp gp g

    512 Pi hi[Amarasinghe06]

    128256

    512

    Intel

    CiscoCSR-1

    PicochipPC102

    AmbricAM2045

    res

    32

    64

    128 Tflops

    # of

    cor

    8

    16

    32Raw

    Niagara Cell

    CaviumOcteon

    RazaXLR

    #

    2

    4

    8

    Power4 Opteron

    Niagara

    TanglewoodXbox360

    PA-8800

    Boardcom 1480 Opteron 4PXeon MP

    4004

    8008

    80868080 286 386 486 Pentium P2 P3P4Itanium

    Itanium 21

    2

    Athlon

    Power4Power6

    YonahPExtreme

    4

    1985 199019801970 1975 1995 2000 2005 20??

  • Design Issues in Design Issues in MPSoCsMPSoCsgg

    MPSoCsMPSoCs have very complex architectureshave very complex architectures Advanced components and CAD tools very expensiveAdvanced components and CAD tools very expensive TimeTime--closure issues, system speed decreasedclosure issues, system speed decreased

    Aggravated thermal issuesAggravated thermal issues HotHot--spots, nonspots, non--uniform thermal gradientsuniform thermal gradients

    [Sun, 1.8 GHz

    [Sun, Niagara BroadbandP ]

    High chances of thermal

    Sparc v9 Microproc]

    Processor] of thermal wear-outs

    and very short

    [Santarini, EDN, March ‘05]

    lifetimes!

    5

    [Coskun et al ’07, Sun]

  • Thermal Issues Become More Critical Thermal Issues Become More Critical for 3Dfor 3D--MPSoCsMPSoCsfor 3Dfor 3D--MPSoCsMPSoCs

    / PE PE PE I/O

    I/O Pherip

    I/0

    I/0

    PE PE PE

    SRAM SRAM

    I/OPERI

    SD

    RA

    M

    3D Integ.PEs layer

    SDRAM

    I/O Pherip. I/0

    PE

    SRAM SRAM IPHER

    M m

    ain mPE

    SRAMPEs layer

    SRAM SRAM

    I/O

    RALS

    mem

    ory

    More power and more non uniform heat spreading!6

    More power and more non-uniform heat spreading!

  • Reliability Degradation Factors in Reliability Degradation Factors in MPSoCsMPSoCs

    7

  • Reliability Degradation Factors in Reliability Degradation Factors in MPSoCsMPSoCs

    Thermal Hot Spots

    8

  • Reliability Degradation Factors in Reliability Degradation Factors in MPSoCsMPSoCs

    9

  • Reliability Degradation Factors in Reliability Degradation Factors in MPSoCsMPSoCs

    80

    90

    ∆T Fatigue failures        

    70

    T (C)increase with:• Magnitude of variation

    601 31 61 91 121 151 181 211 241 271

    Time (sec)

    variation• Frequency of cycles

    Caused by: • Power on/ off• Power management (turning off cores) 

    10

  • Reliability Degradation Factors in Reliability Degradation Factors in MPSoCsMPSoCs

    11

  • Reliability Degradation Factors in Reliability Degradation Factors in MPSoCsMPSoCs

    Spatial GradientsSpatial Gradients

    12

  • Advocating ThermalAdvocating Thermal--Aware Aware 2D/3D2D/3D MPSoCMPSoC DesignDesign2D/3D 2D/3D MPSoCMPSoC DesignDesign

    Integration of HW/SW modeling and management

    Heat Flow Models

    Integration of HW/SW modeling and management

    Fast Thermal Exploration

    HW Tuning knobs

    HW Thermal monitoring knobsmonitoring

    HW Based Thermal Management PoliciesHW-Based Thermal Management Policies

    SW Based Thermal Management Policies

    13

    SW-Based Thermal Management Policies

  • OutlineOutline

    Part 1:Part 1: Thermal Modeling and Management for Thermal Modeling and Management for 2D 2D MPSoCsMPSoCs

    Part 2:Part 2: Thermal Modeling and Management for 3D Thermal Modeling and Management for 3D MPSoCsMPSoCs with Active Coolingwith Active Coolinggg

    Acknowledgements:Acknowledgements:Acknowledgements: Acknowledgements: Prof. Ayse K. Coskun Prof. Ayse K. Coskun (Boston University and Sun (Boston University and Sun MicrosystMicrosyst.).),,Dr. Srinivasan MuraliDr. Srinivasan Murali ((iNoCsiNoCs and EPFL)and EPFL),,Dr. Srinivasan Murali Dr. Srinivasan Murali ((iNoCsiNoCs and EPFL)and EPFL),,Prof. Jose L. Ayala Prof. Jose L. Ayala (UCM)(UCM), , Thomas Brunschwiler and Dr. Bruno Michel Thomas Brunschwiler and Dr. Bruno Michel (IBM Zürich)(IBM Zürich), ,

    14

    Prof. Stephen Boyd Prof. Stephen Boyd (Stanford University)(Stanford University)

  • Thermal Modeling, Analysis and Thermal Modeling, Analysis and Management of 2D MultiManagement of 2D Multi ProcessorProcessorManagement of 2D MultiManagement of 2D Multi--Processor Processor

    SystemSystem--onon--ChipChip

    P f D id Ati AlP f D id Ati AlProf. David Atienza AlonsoProf. David Atienza AlonsoEmbedded Systems Laboratory (ESL) y y ( )Institute of EE, Faculty of Engineering

    ARTIST ARTIST SummerSummer SchoolSchool 2010, 2010, AutransAutrans (France)(France)

  • OutlineOutline

    MPSoCMPSoC thermal modeling and analysisthermal modeling and analysis HWHW based thermal management forbased thermal management for MPSoCsMPSoCs HWHW--based thermal management for based thermal management for MPSoCsMPSoCs SWSW--based thermal management for based thermal management for MPSoCsMPSoCs ConclusionsConclusions

    2

  • OutlineOutline

    MPSoCMPSoC thermal modeling and analysisthermal modeling and analysis HWHW based thermal management forbased thermal management for MPSoCsMPSoCs HWHW--based thermal management for based thermal management for MPSoCsMPSoCs SWSW--based thermal management for based thermal management for MPSoCsMPSoCs ConclusionsConclusions

    3

  • MPSoCMPSoC ThermalThermal ModelingModeling ProblemProblem

    Continuous heat flow analysis Capture geometrical characteristics

    of MPSoCs Explore different packaging features

    and heat sink characteristics

    Time-variant heat sources Very complex Transistor switching depends on

    MPSoC run-time activity (software)

    Very complex computational

    problem! Dynamic interaction with heat flow

    analysis

    4

  • MPSoCMPSoC ThermalThermal ModelingModelingStateState--ofof--thethe--ArtArtStateState ofof thethe ArtArt

    MPSoCMPSoC ModelingModeling and and ExplorationExploration1 SW1 SW i l tii l ti T tiT ti ll t ( 100 KH )t ( 100 KH )1. SW 1. SW simulationsimulation: : TransactionsTransactions, c, cycleycle--accurate (~100 KHz) accurate (~100 KHz)

    [Synopsys Synopsys RealviewRealview, Mentor , Mentor PrimecellPrimecell, Madsen et al., Angiolini et al.] , Madsen et al., Angiolini et al.]

    At the desired cycleAt the desired cycle accurate level they are too slow foraccurate level they are too slow forAt the desired cycleAt the desired cycle--accurate level, they are too slow for accurate level, they are too slow for thermal analysis of realthermal analysis of real--life applications!life applications!

    2 HW2 HW prototypingprototyping:: CoreCore dependentdependent (~50(~50 100 MHz)100 MHz)2. HW 2. HW prototypingprototyping: : CoreCore dependentdependent (~50(~50--100 MHz) 100 MHz) [[CadenceCadence PalladiumPalladium II,II, ARM Integrator IP, Heron Engineering]ARM Integrator IP, Heron Engineering]

    Very expensive and late in design flow, no thermal modeling, Very expensive and late in design flow, no thermal modeling, only used for functional validation of only used for functional validation of MPSoCMPSoC architectures!architectures!

    HeatHeat FlowFlow ModelingModeling::HeatHeat FlowFlow ModelingModeling::1. Software Software thermalthermal//powerpower modelsmodels [[SkadronSkadron et al., et al., KangKang et al.]et al.]

    Too computationally intensive and not able to interact

    5

    Too computationally intensive and not able to interact at run-time with inputs from MPSoC components!

  • MPSoCMPSoC ThermalThermal ModelingModelingStateState--ofof--thethe--ArtArtStateState ofof thethe ArtArt

    MPSoCMPSoC ModelingModeling and and ExplorationExploration1 SW1 SW i l tii l ti T tiT ti ll t ( 100 KH )t ( 100 KH )1. SW 1. SW simulationsimulation: : TransactionsTransactions, c, cycleycle--accurate (~100 KHz) accurate (~100 KHz)

    [Synopsys Synopsys RealviewRealview, Mentor , Mentor PrimecellPrimecell, Madsen et al., Angiolini et al.] , Madsen et al., Angiolini et al.]

    At the desired cycleAt the desired cycle accurate level they are too slow foraccurate level they are too slow for

    2 HW2 HW prototypingprototyping:: CoreCore dependentdependent (~50(~50 100 MHz)100 MHz)

    At the desired cycleAt the desired cycle--accurate level, they are too slow for accurate level, they are too slow for thermal analysis of realthermal analysis of real--life applications!life applications!

    Combination of cycle-accurate MPSoC behavior 2. HW 2. HW prototypingprototyping: : CoreCore dependentdependent (~50(~50--100 MHz) 100 MHz) [[CadenceCadence PalladiumPalladium II,II, ARM Integrator IP, Heron Engineering]ARM Integrator IP, Heron Engineering]

    Very expensive and late in design flow, no thermal modeling, Very expensive and late in design flow, no thermal modeling,

    yand IC heat flow modeling at run-time is unheard of

    HeatHeat FlowFlow ModelingModeling::

    only used for functional validation of only used for functional validation of MPSoCMPSoC architectures!architectures!

    HeatHeat FlowFlow ModelingModeling::1. Software Software thermalthermal//powerpower modelsmodels [[SkadronSkadron et al., et al., KangKang et al.]et al.]

    Too computationally intensive and not able to interact

    6

    Too computationally intensive and not able to interact at run-time with inputs from MPSoC components!

  • OrthogonalizingOrthogonalizingMPSoCMPSoC ThermalThermal ModelingModeling andand AnalysisAnalysisMPSoCMPSoC ThermalThermal ModelingModeling and and AnalysisAnalysis

    I/O CPUCPU

    sniffersniffersniffer

    SRAM SRAM

    I/O

    CPU

    SRAM

    CPU

    CPUCPU

    FPGAsniffer

    sniffer

    sniffer

    sniffer

    sniffer

    sniffer

    sniffer

    sniffersniffer

    I/O CPUCPU

    sniffersniffersniffer

    EnergyMPSoC Behavior

    Emulation on FPGA

    Temperature (T)Temperature (T)

    SW SW thermalthermal estimationestimation tooltool(Host PC)(Host PC)

    Framework: MPSoC behavioral model on reconfigurable HW

    7

    Framework: MPSoC behavioral model on reconfigurable HW interacting with efficient thermal estimation

  • Chip and Chip and PackagePackage HeatHeat FlowFlow ModelingModelingpp gg gg Model interfaceModel interface

    Input:Input: powerpower modelmodel ofof MPSoCMPSoC componentscomponents,, geometricalgeometrical propertiesproperties Input: Input: powerpower modelmodel of of MPSoCMPSoC componentscomponents, , geometricalgeometrical propertiesproperties Output: temperature of Output: temperature of MPSoCMPSoC components at runcomponents at run--timetime

    Thermal circuit: 1Thermal circuit: 1stst order RC circuitorder RC circuitThermal circuit: 1Thermal circuit: 1 order RC circuitorder RC circuit HeatHeat flowflow ~ ~ ElectricalElectrical currentcurrent ; ; Temperature ~ VoltageTemperature ~ Voltage Heat spreader and IC composed of elementary blocksHeat spreader and IC composed of elementary blocks160

    /mºK

    ) Si thermal conductivity depends on temperature(IMEC & Freescale, 90nm)

    Cu cucu cu cu 130

    140

    150

    duct

    ivity

    (W/

    Actual value

    (IMEC & Freescale, 90nm)

    si sisi

    sisi

    sisi

    sisi

    Th l d t t i 90100

    110

    120

    herm

    alco

    nd

    C tk = - G (tk)tk+ pk ; k = 1 m

    Thermal capacitance matrixThermal conductance matrixCsi,1

    Csi 2-G1,2 G1,2

    -G2 1 G2 1.90

    27 47 67 87 107 127

    Th

    Temperature (in Celsius)

    8

    C tk G (tk)tk+ pk ; k 1..m Temperature change

    Temperature vector at instant kpower consumption vectorCcu,n

    si,2G2,1 G2,1

  • SW SW ThermalThermal EstimationEstimation ToolTool forfor MPSoCsMPSoCsC C ttkk = = -- G (G (ttkk)t)t kk+ + ppkk ; k = 1..m ; k = 1..m

    .

    Creating linear approximation while retaining variable Creating linear approximation while retaining variable Si thermal conductivity:Si thermal conductivity: Si Si thermalthermal conductivityconductivity linearlylinearly approxapprox. : . : GGi,ji,j ((ttkk)) = I + q = I + q ttkk Numerically integrating in discrete Numerically integrating in discrete

    ti d i thti d i th tt )

    Si thermal conductivity dependent on temperature

    .time domain the time domain the ttkk ::tk+1 = A(tk)t k+ Bpk ; k = 1..m A(t ) 1 1Complexity scales linearly with

    140

    150

    160

    vity

    (W/m

    ºK

    Actual valueLinear fit

    dependent on temperature

    14001600

    tion

    60 sec of MPSoC heat flow analysis

    A(tk) = (I - dtC-1G(tk)) ; B = dtC-1

    Time step chosen small enough for convergence

    Complexity scales linearly with the number of modeled cells (simulated on P4@ 3GHz)

    110

    120

    130

    mal

    cond

    ucti

    thermal library validated 600800

    100012001400

    w e

    stim

    atm

    e (S

    .) Non-linear thermal estim. Proposed

    linear thermal enough for convergence90

    100

    27 47 67 87 107 127

    Ther

    mTemperature (in Celsius)

    thermal library validated against 3D finite element model (IMEC & Freescale) 0

    200400600

    0 2000 4000 6000 8000 10000Hea

    t flo

    wTi

    m estimation

    9

    0 2000 4000 6000 8000 10000

    Number of Cells

  • Case Case StudyStudy: HW 4: HW 4--Core Core MPSoCMPSoCyy

    MPSoCMPSoC Philips Philips boardboard designdesign: : pp gg 4 4 processorsprocessors, DVFS: 100/500 MHz, DVFS: 100/500 MHz PlasticPlastic packagingpackaging

    Software: Software: ImageImage watermarkingwatermarking, video , video renderingrendering

    PowerPower valuesvalues forfor 90nm:90nm: PowerPower valuesvalues forfor 90nm:90nm:ElementElement Max Max PowerPower

    ((mWmW) ) 100 MH100 MH

    Max Max PowerPower((mWmW) )

    500 MH500 MH100 MHz100 MHz 500 MHz500 MHzProcessorProcessor 2,92 x 102,92 x 1022 1,02 x 101,02 x 1033

    DD--CacheCache 1,42 x 101,42 x 1022 7,10 x 107,10 x 1022DD Cache Cache 1,42 x 101,42 x 10 7,10 x 107,10 x 10II--CacheCache 1,42 x 101,42 x 1022 7,10 x 107,10 x 1022

    PrivPriv MemMem 0,61 x 100,61 x 1022 2,75 x 102,75 x 1022

    10

    AMBAAMBA 0,31 x 100,31 x 1022 0,68 x 100,68 x 1022

  • ResultsResults: : ThermalThermal ValidationValidation 44--core core PhilipsPhilips MPSoCMPSoCPhilips Philips MPSoCMPSoC

    MPARM: MPARM: CycleCycle--accurateaccurate SW SW architecturalarchitectural simulatorsimulator CompleteComplete powerpower//thermalthermal modelsmodels tunedtuned toto Philips/IMEC figuresPhilips/IMEC figures Complete Complete powerpower//thermalthermal modelsmodels tunedtuned toto Philips/IMEC figures Philips/IMEC figures SimulationsSimulations tootoo slowslow: 2 : 2 daysdays forfor 0.18 real 0.18 real secsec (12 (12 cellscells) )

    HW HW thermalthermal emulationemulation ableable toto validatevalidate policiespolicies at at runrun--timetimeMany weeks of

    Average temperature of emulated 4-core MPSoCAverage temperature of emulated 4-core MPSoC

    DynamicDynamic VoltageVoltage and and FrequencyFrequency ScalingScaling (DVFS(DVFS) ) basedbased onon thresholdsthresholdsy

    simulation?!Emulation time 45 sec (128 cells)!

    Very fast validation of MPSoC

    400

    420

    in)

    Average temperature of emulated 4 core MPSoC

    400

    420

    n)

    Average temperature of emulated 4 core MPSoC

    DVFS ON:

    500/100 MH500 MHz.

    Very fast validation of MPSoC run-time thermal behavior and management

    340

    360

    380

    pera

    ture

    (Kel

    vi

    340

    360

    380

    pera

    ture

    (Kel

    vi500/100 MHz.

    100 MH

    Package limit (~85ºC)

    300

    320

    340

    0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0

    Tem

    p

    300

    320

    340

    0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0

    Tem

    p 100 MHz.

    11

    0,0 1,0 2,0 3,0 4,0 5,0 6,0 7,0 8,0Time (seconds)

    Simulation in MPARM Emulation

    0,0 1,0 2,0 3,0 4,0 5,0 6,0 7,0 8,0Time (seconds)

    Simulation in MPARM Emulation Emulation with DFS

  • OutlineOutline

    MPSoCMPSoC thermal modeling and analysisthermal modeling and analysis HWHW based thermal management forbased thermal management for MPSoCsMPSoCs HWHW--based thermal management for based thermal management for MPSoCsMPSoCs SWSW--based thermal management for based thermal management for MPSoCsMPSoCs ConclusionsConclusions

    12

  • Temperature Management is Power Temperature Management is Power Control under Thermal ConstraintsControl under Thermal Constraints

    Power consumption of coresPower consumption of cores

    Control under Thermal ConstraintsControl under Thermal Constraints

    Power consumption of cores Power consumption of cores determines thermal behaviordetermines thermal behavior Power consumption depends on Power consumption depends on

    frequency and voltagefrequency and voltagefrequency and voltagefrequency and voltage Setting frequencies/voltages can control Setting frequencies/voltages can control

    power and temperaturepower and temperature

    Optimization problem: Optimization problem: frequency/voltage assignment in frequency/voltage assignment in MPSoCs under thermal constraintsMPSoCs under thermal constraintsMPSoCs under thermal constraintsMPSoCs under thermal constraints MeetMeet processingprocessing requirementsrequirements RespectRespect thermalthermal constraintconstraint atat allall timestimesRespectRespect thermalthermal constraintconstraint at at allall timestimes MinimizeMinimize powerpower consumptionconsumption

    13

  • HWHW--Based Thermal Management Based Thermal Management StateState--ofof--thethe--ArtArtStateState--ofof--thethe--ArtArt

    Static approach: thermalStatic approach: thermal--aware placement to try to even out aware placement to try to even out pppp p yp yworstworst--case thermal profile case thermal profile [[SapatnekarSapatnekar, Wong et al.] , Wong et al.] Computationally difficult problem (NPComputationally difficult problem (NP--complete)complete)Not able to predict all working conditions, and leakage changing Not able to predict all working conditions, and leakage changing dynamically, it is not useful in real systemsdynamically, it is not useful in real systems

    No formalization of the thermal optimization problem Dynamic approach: HWDynamic approach: HW--based dynamic thermal managementbased dynamic thermal management

    Clock gating based on timeClock gating based on time--out out [[XieXie et al., Brooks et al.]et al., Brooks et al.]

    p p

    g gg g [[ , ], ] DVFS based on thresholds DVFS based on thresholds [[ChaparroChaparro et al, et al, MukherjeeMukherjee et al,]et al,] Heuristics for component shut down, limited history Heuristics for component shut down, limited history [Donald et al][Donald et al]

    Techniques to minimize power, they only achieve thermal Techniques to minimize power, they only achieve thermal management as a bymanagement as a by--productproduct

    14

  • Formalization of Thermal Formalization of Thermal Management Problem inManagement Problem in MPSoCsMPSoCs

    Control theory problemControl theory problemOb bl G t i l ti d b h iOb bl G t i l ti d b h i

    Management Problem in Management Problem in MPSoCsMPSoCs

    Optimal frequency assignment module, 2-phase approach: Observable: Geometrical properties and behavior Observable: Geometrical properties and behavior

    Heat flow model and thermal profile estimationHeat flow model and thermal profile estimation Performance countersPerformance counters

    Optimal frequency assignment module, 2 phase approach:

    1) Design-time phase: Find optimal sets of frequenciesfor the cores for different working conditionsPerformance countersPerformance counters

    Controlable: Max. throughput under thermal constraintsControlable: Max. throughput under thermal constraints Tuning knobs: frequencies/voltages of the system (DVFS)Tuning knobs: frequencies/voltages of the system (DVFS)

    2) Run-time phase: Apply one of the predefined sets found in phase 1 for the required system performance

    Optimal Run-time HW

    Observed system: MPSoC Observer and control systemControl output: p

    frequency assignment

    module

    Run time HW DVFS support Cores freqs

    Performance counters( f )

    Requirements:Max. Throughputmodule(average frequency)

    Thermal profile

    Thermal ThermalProcessor

    Constraints:Max. temperature

    15

    profile estimationstatesensorscores

  • ProPro--Active HWActive HW--Based Thermal Control:Based Thermal Control:Phase 1Phase 1 –– DesignDesign--TimeTime

    Predictive model of thermal behavior given a set of Predictive model of thermal behavior given a set of frequency assignmentsfrequency assignments

    Phase 1 Phase 1 DesignDesign TimeTime

    frequency assignments frequency assignments Chip

    floorplan

    Allowed corepower values and

    f i

    Packaging,heat spreader

    i f tiPhase inputs

    pfrequencies information

    Optimization problem: Minimize sum of power Minimize sum of power Optimization problem:

    Constraints:P f t i t f i f

    e su o po ee su o po econsumption of coresconsumption of cores

    Optimal Non-linear offline problem

    Table of cores

    frequencies

    Method

    outputs

    Performance constraint: on average, freq. is favgp

    frequency assignmentThermal equation: Si conductivity depends on tempThermal equationfrequencies assignments

    outputs

    Power equation: quadratic dependence on freq.

    Meet temp. constraints at all time pointsmodulePower equation based on frequency

    16Frequency in predefined range

    q q y

  • Making Power Making Power and Thermal Constraints and Thermal Constraints ConvexConvexConvexConvex

    Power constraint adaptationPower constraint adaptation Change nonChange non--affine (quadratic equality):affine (quadratic equality):

    ppmaxmax ((ffi,ki,k))22 / (/ (ffmaxmax))2 2 = = ppi,ki,k ; i = 1,..,n, ; i = 1,..,n, ∀k∀kSolve convex problem and get table of optimal frequencies for different working conditions in ,, ,,

    To To convex inequalityconvex inequality::pp ((ff ))22 / (/ (ff ))2 2 ≤≤ pp ; i = 1 n; i = 1 n ∀k∀k

    q gpolynomial time (number of processors)

    ppmaxmax ((ffi,ki,k)) / (/ (ffmaxmax)) ≤ ≤ ppi,ki,k ; i = 1,..,n, ; i = 1,..,n, ∀k∀k

    Thermal constraint adaptationThermal constraint adaptation Thermal constraint adaptationThermal constraint adaptation Use worst case thermal Use worst case thermal

    conductivity in the range ofconductivity in the range ofconductivity in the range of conductivity in the range of allowed temperatures, and allowed temperatures, and iterate (if needed) to optimumiterate (if needed) to optimum

    17

  • ProPro--Active HWActive HW--Based Thermal Control:Based Thermal Control:Phase 2Phase 2 -- RunRun--Time Putting It All TogetherTime Putting It All Together

    Use table of frequencies assignments and index by Use table of frequencies assignments and index by l di i ll di i l i i li i l

    Phase 2 Phase 2 RunRun Time, Putting It All TogetherTime, Putting It All Together

    actual conditions at regular runactual conditions at regular run--time intervalstime intervals

    Current temperature of cores Targeted operating frequency of cores pfrequency of cores

    Method inputs

    Run-time optimal DVFS assignment HW module

    1) Index table output of phase 1 with current working conditions

    Run-time DVFS

    Phase

    g

    with current working conditions

    2) Compare to current assignment to cores andgenerate required signaling to modify DVFS values

    DVFSchanges forprocessors

    output

    18

    generate required signaling to modify DVFS values

  • Case Study: 8Case Study: 8--Core Sun Core Sun MPSoCMPSoCyy

    MPSoCMPSoC Sun Niagara architectureSun Niagara architecture MPSoCMPSoC Sun Niagara architectureSun Niagara architecture 8 processing cores SPARC T18 processing cores SPARC T1

    Max. frequency each core: 1 GHz Max. frequency each core: 1 GHz 10 DVFS values, applied every 100ms10 DVFS values, applied every 100ms

    Max. Max. power power per core: 4 Wper core: 4 WExecution characteristicsExecution characteristics ofof Execution characteristics Execution characteristics of of workloads [Sun Microsystems]: workloads [Sun Microsystems]: Mixes Mixes of 10 different benchmarks, of 10 different benchmarks,

    from webfrom web--accessing toaccessing to multimediamultimediafrom webfrom web accessing to accessing to multimediamultimedia 60,000 iterations of basic 60,000 iterations of basic

    benchmarks, tens benchmarks, tens of seconds of of seconds of actual system executionactual system execution

    Sun’s Niagara MPSoC

    19

  • Results: Thermal Constraints RespectedResults: Thermal Constraints Respected

    Total

    DVFS:

    run-time of benchmarks

    180 sec

    Proposed method achieves better throughput than

    106 sec

    Proposed method achieves better throughput than standard DVFS while satisfying thermal constraints

    2-phase Convex method:

    106 sec(45% less exec time)method: exec. time)

    20

  • OutlineOutline

    MPSoCMPSoC thermal modeling and analysisthermal modeling and analysis HWHW based thermal management forbased thermal management for MPSoCsMPSoCs HWHW--based thermal management for based thermal management for MPSoCsMPSoCs SWSW--based thermal management for based thermal management for MPSoCsMPSoCs ConclusionsConclusions

    21

  • MPSoCMPSoC SystemSystem--Level Architecture: Level Architecture: HW and SW LayersHW and SW LayersHW and SW LayersHW and SW Layers

    MPOS To Core #1(30% load)

    Task Manager(30% load)

    To Core #2(60% load)

    To Core #1(70% load)

    LOADFREQUENCY

    100 %

    TASK MIGRATION

    TASK B

    LOADFREQUENCY

    To Core #1(70% load) To Core #2

    (35% load)Can we control the MPSoC thermal profile by

    To Core #2(3 % )

    (70% load)

    TASK AFSE LOAD

    40%

    TASK CFSE LOAD

    40%

    50 %

    FSE LOAD50%

    To Core #1(30% load)

    (35% load)

    To Core #2(60% load)

    controlling software execution?

    SW layers introduced to better exploit the HW of MPSoCsSW layers introduced to better exploit the HW of MPSoCs

    (35% load)% 40%

    0 %

    PROC 1 PROC 2(30% load)

    SW layers introduced to better exploit the HW of MPSoCsSW layers introduced to better exploit the HW of MPSoCs Applications divided in Applications divided in taskstasks: blocks of operations to be executed: blocks of operations to be executed MultiMulti--processor Operating System (MPOS) distributes the tasksprocessor Operating System (MPOS) distributes the tasks

    L d b l iL d b l i l di t ib ti f k b tl di t ib ti f k b t

    22

    Load balancing:Load balancing: equal distribution of work between processorsequal distribution of work between processors

  • Task Migration for Task Migration for Load vs Thermal BalancingLoad vs Thermal BalancingLoad vs. Thermal BalancingLoad vs. Thermal Balancing

    Plain load balancingPlain load balancing Plain load balancingPlain load balancing

    No improvement in workload distribution possible:

    LOADFREQUENCY100 %

    no migrationQ

    TASK BFSE LOADTEMPERATURETEMPERATURE

    TASK A TASK C

    50 % 50%PROCESSOR 1 TEMP.

    MEAN TEMP.

    TEMPERATUREHot-spot! 40%

    FSE LOAD40%

    FSE LOAD40%

    0 %TIME

    PROCESSOR 2 TEMP.

    TIME

    23PROC 1 PROC 2

    TIME

  • Task Migration for Task Migration for Load vs Thermal BalancingLoad vs Thermal BalancingLoad vs. Thermal BalancingLoad vs. Thermal Balancing

    Heat&RunHeat&Run: Load balancing with local knowledge of: Load balancing with local knowledge of Heat&RunHeat&Run: Load balancing with local knowledge of : Load balancing with local knowledge of temperature in temperature in MPSoCMPSoC componentscomponents

    LOADFREQUENCY100 %

    LOADFREQUENCY

    TASK MIGRATIONQ

    TASK BFSE LOAD

    100 %

    TASK BTEMPERATURE

    TASK A TASK C

    50 % 50%40%

    TASK A TASK C

    50 %

    FSE LOAD50%

    TEMPERATURE

    40%

    FSE LOAD40%

    FSE LOAD40%

    0 %

    TASK AFSE LOAD

    40%

    TASK CFSE LOAD

    40%0 %

    TIME

    24PROC 1 PROC 2PROC 1 PROC 2

    TIME

    source target

  • Task Migration for Task Migration for Load vs Thermal BalancingLoad vs Thermal BalancingLoad vs. Thermal BalancingLoad vs. Thermal Balancing

    Heat&RunHeat&Run: Load balancing with local knowledge of: Load balancing with local knowledge of Heat&RunHeat&Run: Load balancing with local knowledge of : Load balancing with local knowledge of temperature in temperature in MPSoCMPSoC componentscomponents Helping with hotHelping with hot--spots, but no thermal balancingspots, but no thermal balancing

    LOADFREQUENCY

    TASK MIGRATIONExisting approaches do not consider

    TEMPERATURE

    100 %

    TASK B

    Existing approaches do not consider global thermal dynamics for task migration

    TEMPERATURE

    TASK A TASK C

    50 %

    FSE LOAD50%40%

    TIME

    TASK AFSE LOAD

    40%

    TASK CFSE LOAD

    40%0 %

    25

    TIMEPROC 1 PROC 2source target

  • Task Migration for Task Migration for LoadLoad vsvs Thermal BalancingThermal Balancing

    Contribution: Migration strategy for thermal balancingContribution: Migration strategy for thermal balancing Global knowledge of temperature at MPOS levelGlobal knowledge of temperature at MPOS level

    Load Load vsvs Thermal BalancingThermal Balancing Global knowledge of temperature at MPOS levelGlobal knowledge of temperature at MPOS level Adjusted to particular thermal dynamics of each platformAdjusted to particular thermal dynamics of each platform

    FormalizationFormalization FormalizationFormalization Dynamic number of tasks, no control theory formalization possibleDynamic number of tasks, no control theory formalization possible Knapsack problem, move N largest tasks between cores: estimated Knapsack problem, move N largest tasks between cores: estimated

    TEMPERATURETEMPERATURE

    p p gp p gincrease in temperature and minimizing performance penalty increase in temperature and minimizing performance penalty

    TEMPERATURE

    UPPER TRESHOLD

    TIME TIME

    LOWER TRESHOLD

    Reduces hot spots and reaches thermal balancing26

    TIMEReduces hot-spots and reaches thermal balancing

  • Case Case StudyStudy: : FreescaleFreescale MPSoCMPSoC BoardBoard

    HardwareHardware

    yy

    HardwareHardware 3 RISC processor cores3 RISC processor cores 16KB caches, 32KB shared16KB caches, 32KB shared memmem..16KB caches, 32KB shared 16KB caches, 32KB shared memmem.. AMBA bus, 2GB ext. AMBA bus, 2GB ext. memmem

    SoftwareSoftware SoftwareSoftware uCLinuxuCLinux--based MPOSbased MPOS Multimedia applications: audio and video Multimedia applications: audio and video pppp

    Two packaging optionsTwo packaging options MobileMobile embeddedembedded SoCsSoCs (slow temperature(slow temperature variationsvariations)) Mobile Mobile embeddedembedded SoCsSoCs (slow temperature (slow temperature variationsvariations)) High performance High performance SoCsSoCs (fast temperature (fast temperature variationsvariations))

    27

  • Results and Comparisons Results and Comparisons pp

    Good thermal balancingGood thermal balancing ~1 2ms @ 400MHz (1% overhead) Good thermal balancingGood thermal balancing Average: 40.5ºC, Average: 40.5ºC,

    variations of < 3ºC variations of < 3ºC

    ~1.2ms @ 400MHz (1% overhead)

    Small performance overhead Small performance overhead ( 2 ( 2 migratmigrat/s)/s) +/-3º

    Comparisons with Comparisons with other policiesother policies

    L d b l iL d b l i

    Good performance and uniform temperature adjusting globally to thermal dynamics with MPOS Load balancing Load balancing inefficient inefficient (>7ºC (>7ºC diffsdiffs))

    Heat&RunHeat&Run inefficient or causes inefficient or causes many deadline many deadline

    adjusting globally to thermal dynamics with MPOS

    yymisses (40% below performance requirements) misses (40% below performance requirements)

    Contribution: Contribution: performance requirements met performance requirements met for both types of packagingfor both types of packaging

    28

    for both types of packaging for both types of packaging

  • Adapt2DAdapt2D--MIGRA: MIGRA: CombinationCombination of HW and SWof HW and SW--BasedBasedProPro--ActiveActive ThermalThermal ManagementManagementProPro Active Active ThermalThermal ManagementManagement

    Initial: Large gradients New: Thermal balancingg g g

    HWHW basedbased managementmanagement:: ConvexConvex basedbased dynamicdynamic voltagevoltage andand HWHW--basedbased managementmanagement: : ConvexConvex--basedbased dynamicdynamic voltagevoltage and and frequencyfrequency scalingscaling (DVFS) (DVFS) explorationexploration

    SWSW--basedbased managementmanagement: : ProactiveProactive tasktask schedulingscheduling and and migrationmigration SupportSupport of of multimulti--processorprocessor operatingoperating systemsystem: : SolarisSolaris MultiMulti--CoreCore

    Good thermal control in commercial MPSoCs in

    29

    Good t e a co t o co e c a SoCs90nm, what about 3D integration?

  • OutlineOutline

    MPSoCMPSoC thermal modeling and analysisthermal modeling and analysis HWHW based thermal management forbased thermal management for MPSoCsMPSoCs HWHW--based thermal management for based thermal management for MPSoCsMPSoCs SWSW--based thermal management for based thermal management for MPSoCsMPSoCs ConclusionsConclusions

    30

  • ConclusionsConclusions Progress in semiconductor technologies enables new Progress in semiconductor technologies enables new MPSoCsMPSoCs

    Thermal/reliability issues must be addressed for safe human interactionThermal/reliability issues must be addressed for safe human interaction Thermal/reliability issues must be addressed for safe human interactionThermal/reliability issues must be addressed for safe human interaction Thermal monitoring and control are keyThermal monitoring and control are key

    Cl b fit f th lCl b fit f th l d i th d fd i th d f MPS CMPS C Clear benefits of thermalClear benefits of thermal--aware design methods for aware design methods for MPSoCsMPSoCs Novel, fast and lowNovel, fast and low--cost thermal modeling approach at systemcost thermal modeling approach at system--levellevel Formalization of HWFormalization of HW--based thermal management problem as convexbased thermal management problem as convexFormalization of HWFormalization of HW based thermal management problem as convex, based thermal management problem as convex,

    and solved in polynomial timeand solved in polynomial time New SWNew SW--based thermal balancing method with very limited overheadbased thermal balancing method with very limited overhead

    Validation on commercial 2DValidation on commercial 2D-- MPSoCsMPSoCs (Sun, (Sun, FreescaleFreescale, Philips), Philips) Fast exploration of thermal behavior of complexFast exploration of thermal behavior of complex MPSoCsMPSoCsFast exploration of thermal behavior of complex Fast exploration of thermal behavior of complex MPSoCsMPSoCs Effective HWEffective HW-- and SWand SW--based probased pro--active thermal managementactive thermal management

    31

  • Key References and BibliographyKey References and Bibliography Thermal modeling and FPGAThermal modeling and FPGA--based emulationbased emulation

    ““HWHW--SW Emulation Framework for TemperatureSW Emulation Framework for Temperature--Aware Design inAware Design in HWHW--SW Emulation Framework for TemperatureSW Emulation Framework for Temperature--Aware Design in Aware Design in MPSoCsMPSoCs”, D. Atienza, et al. ”, D. Atienza, et al. ACM TODAESACM TODAES, Vol. 12, Nr. 3, pp. 1, Vol. 12, Nr. 3, pp. 1––26, 26, August 2007. August 2007.

    Thermal management for 2D Thermal management for 2D MPSoCsMPSoCs ““ThermalThermal BalancingBalancing PolicyPolicy forfor MultiprocessorMultiprocessor StreamStream Computing Computing

    PlatformsPlatforms”, F. Mulas, et al.,”, F. Mulas, et al., IEEE TIEEE T--CADCAD, Vol. 28, , Vol. 28, Nr.Nr. 12, pp. 187012, pp. 1870--1882, 1882, DecemberDecember 2009.2009.

    ““ProcessorProcessor SpeedSpeed ControlControl withwith ThermalThermal ConstraintsConstraints” A Mutapcic” A Mutapcic ProcessorProcessor SpeedSpeed Control Control withwith ThermalThermal ConstraintsConstraints”, A. Mutapcic, ”, A. Mutapcic, S. Boyd, et al. S. Boyd, et al. IEEE TCASIEEE TCAS--II, Vol. 56, , Vol. 56, Nr.Nr. 9, pp. 19949, pp. 1994--2008, 2008, SeptSept 2009. 2009.

    ““Inducing ThermalInducing Thermal--Awareness in MultiAwareness in Multi--Processor SystemsProcessor Systems--onon--ChipChipInducing ThermalInducing Thermal Awareness in MultiAwareness in Multi Processor SystemsProcessor Systems onon Chip Chip Using NetworksUsing Networks--onon--ChipChip”, E. Martinez, et al., Proc. ”, E. Martinez, et al., Proc. ISVLSIISVLSI 2009. 2009.

    ““Temperature Control of HighTemperature Control of High--Performance MultiPerformance Multi--core Platforms core Platforms

    32

    Using Convex OptimizationUsing Convex Optimization”, ”, S.MuraliS.Murali, et al., et al., Proc. , Proc. DATE, DATE, 2008.2008.

  • QUESTIONS ?Swiss National

    Science Foundation

    QUESTIONS ?European

    Acknowledgements:European

    Commission

    33UCSD / Sun Microsystems IMEC / PhilipsBologna / Freescale

    semiconductorsIBM Zürich

  • Thermal Modeling and ManagementThermal Modeling and Management for 3D MPSoCs with Active Cooling

    P f D id Ati AlProf. David Atienza AlonsoEmbedded Systems Laboratory (ESL) y y ( )Institute of EE, Faculty of Engineering

    © ESL/EPFL 2010

    ARTIST Summer School 2010, Autrans (France)

  • Advantages of 3D vs. 2D Chips

    Promises • Reduce average length of on-chip global wires• Reduce average length of on-chip global wires• Increase number of devices

    reachable in given time budget

    • Greatly facilitate heterogeneous integration (e.g. logic-DRAM stacks)(e.g. logic DRAM stacks)

    © ESL/EPFL 20102

    Samsung Wafer Stack Package (WSP) memory

    [Figures: Ray Yarema, Fermilab]

  • Thermal-Reliability Issues in 3D Chips

    Latest chips increase power densityp p y Non-uniform hot-spots in 2D chips [Sun, 1.8 GHz

    Sparc v9 In 3D chips heat affects several pMicroproc]

    In 3D chips, heat affects several layers! (even more “cool” components)

    [Sun, Courtesy: [ ,Niagara

    BroadbandProcessor]

    [IBM and

    Irvine Sens.]

    © ESL/EPFL 2010 3

  • Thermal-Reliability Issues in 3D Chips

    Latest chips increase power densityp p y Non-uniform hot-spots in 2D chips [Sun, 1.8 GHz

    Sparc v9 In 3D chips heat affects several pMicroproc]

    In 3D chips, heat affects several layers! (even more “cool” components)

    [Sun, Courtesy:Higher chances

    of thermal [ ,Niagara

    BroadbandProcessor]

    [IBM and

    Irvine Sens.]

    wear-outs and

    hvery short lifetimes!

    © ESL/EPFL 2010 4

  • Run-Time Heat Spreading in 3D Chips

    8000

    9000

    10000 Layer 2

    420 5-tier 3D stack: 10 heat sources and sensors

    2000

    3000

    4000

    5000

    6000

    7000

    8000

    wid

    th (u

    m)

    400

    405

    410

    415

    Inject between 4W – 1.5W

    0 2000 4000 6000 8000 100000

    1000

    length (um)

    7000

    8000

    9000

    10000 Layer 3

    404

    406 2nd Tier

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    wid

    th (u

    m)

    396

    398

    400

    402

    0 2000 4000 6000 8000 100000

    length (um)

    394

    5000

    6000

    7000

    8000

    9000

    10000 Layer 4

    dth

    (um

    )

    397

    398

    399

    400

    401

    3rd Tier5th Tier

    0 2000 4000 6000 8000 100000

    1000

    2000

    3000

    4000

    length (um)

    wid

    393

    394

    395

    396

    5000

    6000

    7000

    8000

    9000

    10000 Layer 5

    h (u

    m)

    394

    395

    396

    397

    Large and non-uniform

    © ESL/EPFL 20100 2000 4000 6000 8000 10000

    0

    1000

    2000

    3000

    4000

    length (um)

    wid

    th

    390

    391

    392

    393 4th Tier heat propagation! (up to 130º C on top tier) 5

  • NanoTera CMOSAIC Project: Design of 3D MPSoCs with Advanced Cooling3D MPSoCs with Advanced Cooling

    3D systems require novel electro-thermal co-design• Academic partners: EPFL and ETHZ• Academic partners: EPFL and ETHZ• Industrial: IBM Zürich

    © ESL/EPFL 20106

  • NanoTera CMOSAIC Project: Design of 3D MPSoCs with Advanced Cooling3D MPSoCs with Advanced Cooling

    3D systems require novel electro-thermal co-design• Academic partners: EPFL and ETHZ

    3D stacked MPSoC chips: microchannels etched on

    • Academic partners: EPFL and ETHZ• Industrial: IBM Zürich

    pback side to circulate liquid coolant

    S tSystem Level Active

    C liadjustment Cooling Manager

    adjustmentof coolant flux

    (3D heat flow

    task scheduling andexecution control

    © ESL/EPFL 2010

    prediction)7

  • Outline

    IntroductionIntroduction 3D chip thermal modeling framework Validation of 3D thermal model Validation of 3D thermal model Liquid cooling modeling

    Li id li d l lid ti Liquid cooling model validation Close-loop 3D MPSoCs thermal management with

    ti liactive cooling Experiments and conclusions

    © ESL/EPFL 2010 8

  • Compact RC-Based Tier Thermal Model

    Gate-level thermal model qbTbq 6

    qb_top qb_backRC Network of

    qbTbq jf

    ffjbj

    01

    q qb_rightqb_left

    Si/metal layer cells qb_bottomqb_front

    Convective boundary conditions

    cells

    qb_top = htopA(Ta-Ttop)

    2D tier modeled as heat flux moving between adjacent cells

    Convective boundary conditions between layers in tier

    I•

    (qbi)I

    I+1•

    (qbi)I+1f i

    I-1•

    © ESL/EPFL 20109

    qb_bottom = hbottomA(Ta-Tbottom)face i

    [Atienza et al., TODAES 2007]

  • Complete 3D Chip Thermal Modeling

    Multi-level execution for thermal convergence in 3D • Local (2D-tier), liquid channels and global (3D) propagation

    Evaluate local temperature for each cell

    erge

    nce

    ns

    vel c

    onve

    N it

    erat

    ion

    Feedbacktemperature Update with neighbour

    Tier

    -levN temperature Upda e e g boutemperature difusion

    Go to next tier or microchannel

    © ESL/EPFL 201010[Ayala et al., NanoNet 2009]

  • 3D Chip Thermal Library Validation

    Extensible set of layers in 3D stackExtensible set of layers in 3D stack• up to 9 tiers and heat spreader• Pre-defined layers:

    Sili (10 l ) l Silicon, copper (10 layers), glue, overmold, interposer, bump

    Configurable nr. of cells and iterations per tier• Initially 10ms thermal interval (1000 iterat./tier)

    Multi-tier test chip manufactured at EPFL:

    © ESL/EPFL 2010 11

  • 3D Chip Thermal Library Validation

    Extensible set of layers in 3D stackExtensible set of layers in 3D stack• up to 9 tiers and heat spreader• Pre-defined layers:

    Sili (10 l ) l Silicon, copper (10 layers), glue, overmold, interposer, bump

    Configurable nr. of cells and iterations per tier• Initially 10ms thermal interval (1000 iterat./tier)

    Multi-tier test chip manufactured at EPFL:• Three types of tiers• Three types of tiers

    © ESL/EPFL 2010 12

  • 3D Thermal Library Validation: Creating Various 3D Thermal MapsCreating Various 3D Thermal Maps

    Flexibility for thermal characterizationy

    © ESL/EPFL 2010 13

  • 3D Thermal Library Validation: Creating Various 3D Thermal Maps

    Flexibility for thermal characterization

    Creating Various 3D Thermal Maps

    y

    © ESL/EPFL 2010 14

  • 3D Thermal Library Validation: Creating Various 3D Thermal MapsCreating Various 3D Thermal Maps

    Flexibility for thermal characterizationy

    10 heat sources and sensors per layer, ibl t b i lt l ti t d

    © ESL/EPFL 2010

    accesible to be simultaneously activated

    15

  • 3D Thermal Library Validation: Correlation with 5-Tier 3D StackCorrelation with 5 Tier 3D Stack

    3D Chi EPFL L 3 h t i ti

    29303132

    e (m

    V)

    3D Chip, EPFL, Layer 3 characterizationBlue Curve: 3D current -heat model for D8Pink curve: Heater current measured in D8

    Dev8D7HD8S

    242526272829

    0 200 400 600 800 1000 1200

    Sens

    or V

    olta

    g

    0 200 400 600 800 1000 1200Heater Current (mA), applied to Dev 7

    3D Chip EPFL multi tier characterization

    343638

    mV)

    3D Chip, EPFL, multi-tier characterizationBue/Pink Curve: D7 (tier 1) and D8 (tier 4)Red Curve: 3D current-heat model for D8

    Dev6D 7

    2628303234

    Sens

    or V

    olta

    ge ( Dev7

    Div6_Iheat7

    © ESL/EPFL 2010

    240 200 400 600 800 1000 1200 1400

    Heater Current (mA), applied to Dev 7

    16[Ayala et al., Nano-Nets ’09]

  • 3D Thermal Library Validation: Correlation with 5-Tier 3D StackCorrelation with 5 Tier 3D Stack

    3D Chi EPFL L 3 h t i ti

    29303132

    e (m

    V)

    3D Chip, EPFL, Layer 3 characterizationBlue Curve: 3D current -heat model for D8Pink curve: Heater current measured in D8

    Dev8D7HD8S

    242526272829

    0 200 400 600 800 1000 1200

    Sens

    or V

    olta

    g

    0 200 400 600 800 1000 1200Heater Current (mA), applied to Dev 7

    3D Chip EPFL multi tier characterization

    Variations of less than 1.5% between 3D stack measurements and new 3D thermal model

    343638

    mV)

    3D Chip, EPFL, multi-tier characterizationBue/Pink Curve: D7 (tier 1) and D8 (tier 4)Red Curve: 3D current-heat model for D8

    Dev6D 7

    2628303234

    Sens

    or V

    olta

    ge ( Dev7

    Div6_Iheat7

    © ESL/EPFL 2010

    240 200 400 600 800 1000 1200 1400

    Heater Current (mA), applied to Dev 7

    17[Ayala et al., Nano-Nets ’09]

  • Modeling Through Silicon Vias (TSVs) in 3D Stacksin 3D Stacks

    TSVFigure: LSM-EPFL

    TSVs:• Size: 5-10um x 10-100um• TSVs change resistivity of interlayer material (IM)

    Modeling Granularities:1. Homogeneous

    distribution, one R ,value for the IM

    2. Different R value per unit (core cache etc )unit (core, cache, etc.)

    3. Exact locations of TSVs

    Hi h Source: IBM Zürich and

    © ESL/EPFL 2010

    • Higher accuracy• Higher complexity

    18

    Source: IBM Zürich and Y.Heights

  • TSV Modeling Accuracy in 3D Stacks

    Chosen to model TSV groups in localized positions

    © ESL/EPFL 2010 19

    Chosen to model TSV groups in localized positions of 3D MPSoCs

  • Liquid Flux Model for Laminar Flowq

    Local junction temperature modeled as RC t knetwork:

    Rtot = Rcond + Rconv + Rheat Thermal resist.

    Heat source

    Rtot = 1/(Gsi/t + 1/Rb) + A/(bAt) + A/(VPcp) of Si

    Chip back-side temperatureSi b Heating Fl t d

    Thermal

    temperatureSi base thickness

    Heating area Total area

    Flow rate and density

    Dependence of thermal resistance in liquid Thermal resistance of

    wiringq2

    Dependence of thermal resistance in liquid flux modeled as a quadratic form• Variable value of coolant flux (Φ)

    T1

    P P

    Variable value of coolant flux (Φ)∆Rheat ≈ aΦ + bΦ2 ; b

  • 3D Thermal Model with Liquid Cooling

    New set of layers in 3D stack3D t k ( t 9 ti )• 3D stack (up to 9 tiers)

    • 1 microchannel and coolant flow per tier

    5 ti t k ith i h l d if ld 5-tier stack with microchannels and manifold cooling seal manufactured at IBM/EPFL• Enables different multi-tier liquid flux injection

    Micro-HeaterLiquid

    j

    PCB Micro-Channels

    © ESL/EPFL 2010 21

    Source: IBM & ESL, EPFL

  • Manufacturing of 5-Tier 3D Test Chip with Liquid Channels in Multiple TiersLiquid Channels in Multiple Tiers

    Front-sideBack-side

    Figure: IBM & ESL, EPFL

    Adding multi-tier liquid cooling in-/out-lets Multi-tier active cooling technology feasible for

    © ESL/EPFL 2010 22

    technology feasible for 3D-stacked chips

  • Correlation Results: Liquid Cooling and 3D Heat TransferLiquid Cooling and 3D Heat Transfer

    Temperature evolution at the junction (Tj)

    q 1q 2

    P2

    • Tested range: 0.015 to 0.15 L/min • Similar accuracy results at different channels

    qq

    T 1 P1

    Avg Max temp Error= 0.6%

    © ESL/EPFL 2010 23[Atienza et al., THERMINIC ’09]

  • Correlation Results: Liquid Cooling and 3D Heat Transfer

    q 1q 2

    P2

    Liquid Cooling and 3D Heat Transfer Temperature evolution at the junction (Tj)

    qq• Tested range: 0.015 to 0.15 L/min • Similar accuracy results at different channels

    T 1 P1Variations of less than 1% between measurements

    and RC-based 3D thermal model with liquid cooling

    Avg Max temp Error= 0.6%

    © ESL/EPFL 2010 24[Atienza et al., THERMINIC ’09]

  • Complete 3D Chip Thermal Modeling Flow with Liquid Coolingwith Liquid Cooling

    Inputs:Inputs:

    • Workload information• Floorplan, TSV areas, package

    ( )

    Inputs: • Workload information

    • Activity of cores

    Scheduler (Reactive Proactive)

    • Temperature (for dynamic policies) Power Manager (DPM)

    Inputs: • Power trace for each unitScheduler (Reactive, Proactive) Power trace for each unit

    • Floorplan, package and die properties (Niagara-1), TSV area percentage/distribution

    • Flow rate

    3D Thermal Simulator w. Liquid Cooling based on EPFL-IBM 3D chips

    (Integrated within internal HotSpot tool version)

    Transient Temperature Response

    © ESL/EPFL 2010

    Transient Temperature Response for Each Unit

    25

  • Run-Time HW/SW Thermal Modeling Framework for 3D Chips

    Multi Proc OS + DVFS + Task Migration

    Framework for 3D Chips Exploitation of both hardware and software benefits

    I/O

    SRAM SRAM

    CPU

    SRAM

    CPUMulti-Proc. OS + DVFS + Task Migration

    Sw app 1 ... Sw app NZeroZero--delaydelayMPS CMPS C

    sniffer

    sniffersniffer

    iff

    sniffer

    SRAM SRAM

    I/O

    SRAM

    CPUCPUMPSoCMPSoC

    architecturearchitecturesimulationsimulation

    Energy of 2D componentssniffer

    sniffer

    sniffersniffer

    sniffersnifferMPSoC Behavior

    Emulation on FPGA

    simulationsimulation p

    Temp. (T) of 2D components

    standardstandard Ethernet Ethernet connectionconnection & & dedicateddedicated HW monitorHW monitorDetailedDetailed components

    Software Thermal

    si si si sicu cucucucu

    thermalthermalanalysisanalysis of of 2D2D MPSoCMPSoC

    © ESL/EPFL 2010

    Model Host PC

    si sisi

    sisi

    sisi

    sisi2D 2D MPSoCMPSoC

    layoutlayout26[D. Atienza et al., TODAES 2007]

  • Run-Time HW/SW Thermal Modeling Framework for 3D ChipsFramework for 3D Chips

    Multi Proc OS + DVFS + Task Migration Exploitation of both hardware and software benefits

    I/O

    SRAM SRAM

    CPU

    SRAM

    CPUMulti-Proc. OS + DVFS + Task Migration

    Sw app 1 ... Sw app NZeroZero--delaydelayMPS CMPS C

    sniffer

    sniffersniffer

    iff

    sniffer

    SRAM SRAM

    I/O

    SRAM

    CPUCPUEnergy of 3D components

    MPSoCMPSoCarchitecturearchitecturesimulationsimulation sniffer

    sniffer

    sniffersniffer

    sniffersnifferMPSoC Behavior

    Emulation on FPGA

    componentssimulationsimulation

    standardstandard Ethernet Ethernet connectionconnection & & dedicateddedicated HW monitorHW monitor

    Temp. of3D components

    3D StackThermal

    Nth Tier

    © ESL/EPFL 2010

    Model1st Tier Host PC

    [D. Atienza, THERMINIC 2009] 27

  • Thermal Management for 3D-MPSoCs with Liquid Coolingwith Liquid Cooling

    Active-Adapt3D: Combined policy manager (B t P A d t IEEE/IFIP VLSI S C 2009)(Best-Paper Award at IEEE/IFIP VLSI-SoC 2009)• Predictive, floorplan-based task assignment and DVFS

    Cl l i bl li id li t l• Close-loop variable liquid cooling control T ≥ 80°C Increment flow rate ; T < 80°C Decrement P li b li d ti l ti l Policy can be applied reactively or proactively

    Thermal SensorsSystem Temperature 

    Flow Rate

    Temperature Measurements

    pDynamics

    REACTIVE

    ARMA‐Based 

    Flow Rate Tuner

    Temperature 

    © ESL/EPFL 2010 28

    Predictor Forecast

    PROACTIVE

  • Adaptive Thermal-Aware Task Assignment Policy for 3D MPSoCsAssignment Policy for 3D MPSoCs

    Cores on layers closer to the heat sink can be cooled faster in comparison to cores further away

    Adapt-3D assigns a thermal index ( ) to each core in order to distinguish the location of the cores• Higher Core more prone to hot spotsi

    Higher Core more prone to hot spotsi

    For cores at locations 1, 2 and 3:

    Chip 2 2

    3

    321

    Chip‐1

    Chip‐2 2

    © ESL/EPFL 2010

    Chip‐1

    29[Coskun and Atienza, DATE ‘09]

  • Adaptive Thermal-Aware Task Assignment Policy for 3D MPSoCsAssignment Policy for 3D MPSoCs

    WPP tt 1Probability of receiving workload at time t:

    preferredavginitinc TTW if

    1Weight:

    Cool core

    preferredavgiinitdec

    preferredavgi

    initinc

    TTWW

    if

    Hot coreFor each core

    TTWEmpirical  avgpreferredinit TTW

    Measured by sensors

    pconstants

    © ESL/EPFL 2010

    E.g., 80oCMeasured by sensors

    30

  • Experiments 3D Thermal Management: 3D MPSoCs with Microchannels

    Target 3D systems based on 3D version Sun UltraSPARC T1 P l d kl d f l t d i S

    3D MPSoCs with Microchannels

    • Power values and workloads from real traces measured in Sun platforms (multimedia players, web servers, databases, etc.)

    Cores and caches in separate layersCores and caches in separate layers

    Channels:Width 400umWidth 400um, Depth 250um. Four flow rate

    © ESL/EPFL 2010

    settings, default at 15ml/min.

    31(EXP1-2) (EXP3) (EXP4)

  • Thermal Management for 3D Chips: Active-Adapt3D ComparisonsActive Adapt3D Comparisons

    Predictive task scheduling, active cooling and floorplan-Predictive task scheduling, active cooling and floorplanaware DVFS achieves less than 5% hotspots

    © ESL/EPFL 2010

    Promising figures for thermal control in 3D-MPSoCs32[Coskun and Atienza, DATE ‘10]

  • Thermal Management in 3D Chips: Active-Adapt3D ComparisonsActive Adapt3D Comparisons

    Variable multi-tier flow control useful for 3D systems with 3+ layersVariable multi tier flow control useful for 3D systems with 3+ layers. Proactive thermal management achieves:

    • 75% reduction in spatial gradients on average -- for fixed flow ratep g g• 97% reduction in spatial gradients on average -- for variable flow rate

    *LC: Multi-tier variable liquid

    cooling

    © ESL/EPFL 2010 33

    Cooling power savings up to 67% to worst-case flux [Coskun and Atienza, DATE ‘10]

  • Conclusions

    Complexity of coming 3D MPSoC chips requires novelthermal modeling approachesthermal modeling approaches• Application of simple RC-based methods demonstrated,

    validated with 3D test chip• Initial model of liquid cooling channels in 3D chips

    Simple RC laminar flow model, works well with variable liquidfluxes (errors of less than 2%)fluxes (errors of less than 2%)

    Integrated the compact model into custom HotSpot tool

    New thermal management: feedback controller adjusts flow New thermal management: feedback controller adjusts flow rate to allowed temperature with job assignment and DVFS• Proactive control improves the hot spot reduction to 95% forProactive control improves the hot spot reduction to 95% for

    systems with variable flow rates, and reduces thermal variations• Dynamic flow rate adjustment is helpful in reducing the energy cost

    f th d ll t (67% i )

    © ESL/EPFL 2010

    of the pump and overall system (67% power savings)

    34

  • Key References and Bibliography

    3D Thermal modeling and FPGA-based emulation• “3D-ICE: Compact transient thermal model for 3D ICs with liquid cooling via

    enhanced heat transfer cavity geometries”, A. Sridhar, et al. Proc. of ICCAD 2010,USA, November 2010.

    • “Transient Thermal Modeling of 2D/3D Systems-on-Chip with Active Cooling”,David Atienza, Proc. of THERMINIC 2009, Belgium, October, 2009.

    Thermal management for 3D MPSoCs• “Fuzzy Control for Enforcing Energy Efficiency in High-Performance 3D

    Systems”, M. Sabry, Ayse K. Coskun, David Atienza, Proc. of ICCAD 2010, USA,November 2010.

    • “Energy-Efficient Variable-Flow Liquid Cooling in 3D Stacked Architectures”,Ayse K. Coskun, David Atienza, et al., Proc. of DATE 2010, Germany, March 2010.

    • “Modeling and Dynamic Management of 3D Multicore Systems with LiquidC C f S S C OCooling”, Ayse K. Coskun, et al., Proc. of VLSI-SoC 2009, Brazil, October 2009.(Best Paper Award)

    • “Dynamic Thermal Management in 3D Multicore Architectures”, Ayse K. Coskun,f

    © ESL/EPFL 2010

    et al., Proc. of DATE 2009, France, April 2009.

  • Nano-Tera.ch Swiss Engineering Programme

    European

    QUESTIONS ?

    pCommission

    QUESTIONS ? Swiss National Science Foundation

    © ESL/EPFL 2010