21
30.09.20 13 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz Jarus PSNC – Poznan Supercomputing and Networking Center Thomas Zilio IRIT – Institut de Recherche en Informatique de Toulouse 30 September 2013 1

30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

Embed Size (px)

Citation preview

Page 1: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 1

Energy and heat-aware metrics for data centers

Jaume Salom, Laura SisóIREC - Catalonia Institute for Energy Research

Ariel Oleksiak, Mateusz JarusPSNC – Poznan Supercomputing and Networking Center

Thomas ZilioIRIT – Institut de Recherche en Informatique de Toulouse

30 September 2013

Page 2: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 2

Introduction

• CoolEmAll will provide tools for planners and operators of DC to carry out flexible and fast simulations to improve energy efficiency and to reduce the carbon footprint associated

• Metrics suitable to quantify CoolEmAll improvement in energy efficiency

EuroEcoDC Workshop - Karlsruhe

Page 3: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 3EuroEcoDC Workshop - Karlsruhe

Contents

1. Present status of DC metrics2. Properties of CoolEmAll metrics3. New metrics proposed4. Description of experiment5. Check Imbalance of Temperature6. Further steps: other metrics to test7. Conclusions

Page 4: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 4EuroEcoDC Workshop - Karlsruhe

Present status of DC metrics• Metrics related to power for complete DC– PUE– Global KPI

• Metrics that consider energy reuse, carbon emissions or water use:– ERE,CUE,WUE

• Metrics to consider the power required in idle conditions– FVER

• Metrics for IT Components: – Power Usage, resource/Watt

Page 5: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 5EuroEcoDC Workshop - Karlsruhe

Properties of CoolEmAll metrics• Focus on Energy not only on peak-power• Focus on Temperature not only on

Power• Heat-aware metrics• Focus on Useful Work on Applications

not only on IT Consumption• Selection of useful and consistent

metrics to assess different granularity levels of a DC (CPU, rack, room)

• Holistic approach

Page 6: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 6EuroEcoDC Workshop - Karlsruhe

Properties of CoolEmAll metrics

• Granularity:– Node unit– Node group– Rack level– Room of a DC

• Focus on:– Resource usage– Energy– Heat-aware

Page 7: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 7EuroEcoDC Workshop - Karlsruhe

Metrics at node-group level

• Node-group cooling index

Referred to the air inlet temperatures

Recommended and allowed values by ASHRAE

New metrics proposed

𝐶𝐼𝑁𝐺 ,𝐻𝐼=[1−∑ (𝑇 𝑁𝐺 ,𝑥−𝑇𝑚𝑎𝑥−𝑟𝑒𝑐 )𝑇 𝑁𝐺 , 𝑥>𝑇𝑚𝑎𝑥−𝑟𝑒𝑐

(𝑇𝑚𝑎𝑥−𝑎𝑙𝑙−𝑇𝑚𝑎𝑥−𝑟𝑒𝑐 )∗𝑛 ]∗100

𝐶𝐼𝑁𝐺 ,𝐿𝑂=[1−∑ (𝑇𝑚𝑖𝑛−𝑟𝑒𝑐−𝑇𝑁𝐺 , 𝑥 )𝑇 𝑁𝐺 ,𝑥<𝑇𝑚𝑖𝑛− 𝑟𝑒𝑐

(𝑇𝑚𝑖𝑛−𝑟𝑒𝑐−𝑇𝑚𝑖𝑛−𝑎𝑙𝑙 )∗𝑛 ]∗100

Page 8: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 8EuroEcoDC Workshop - Karlsruhe

Metrics at node-group level

• Node-group cooling index - meaning

– CING,HI = 100% All intake temperatures ≤ max. recommended temperature.

– CING,HI < 100% At least one intake temperatures > max. recommended temperature.

– CING,LO = 100% All intake temperatures ≥ min. recommended temperature.

– CING,LO < 100% At least one intake temperatures < min. recommended temperature.

New metrics proposed

𝐶𝐼𝑁𝐺 ,𝐻𝐼=[1−∑ (𝑇 𝑁𝐺 ,𝑥−𝑇𝑚𝑎𝑥−𝑟𝑒𝑐 )𝑇 𝑁𝐺 , 𝑥>𝑇𝑚𝑎𝑥−𝑟𝑒𝑐

(𝑇𝑚𝑎𝑥−𝑎𝑙𝑙−𝑇𝑚𝑎𝑥−𝑟𝑒𝑐 )∗𝑛 ]∗100𝐶𝐼𝑁𝐺 ,𝐿𝑂=[1−∑ (𝑇𝑚𝑖𝑛−𝑟𝑒𝑐−𝑇𝑁𝐺 , 𝑥 )𝑇 𝑁𝐺 ,𝑥<𝑇𝑚𝑖𝑛− 𝑟𝑒𝑐

(𝑇𝑚𝑖𝑛−𝑟𝑒𝑐−𝑇𝑚𝑖𝑛−𝑎𝑙𝑙 )∗𝑛 ]∗100

Page 9: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 9EuroEcoDC Workshop - Karlsruhe

Metrics at node-group, rack and DC level

• Imbalance of temperature of CPU

– ImNG,temp =0 means all of nodes works at the same temperature

New metrics proposed

Page 10: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 10EuroEcoDC Workshop - Karlsruhe

Description of experiment

• Prototype server RECS from Christmann Company

• RECS: high density multinode computer of 18 single server nodes withing one Rack Unit

• CPU: Intel Core i7-3615QE CPU @ 2.30GHz, CPU Cache: 6144 KB, RAM: 16 GB

• Load OpenSSL Benchmark

Page 11: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 11EuroEcoDC Workshop - Karlsruhe

Description of experiment

• 6 configuration1. Idle 2. Full3. Left 4. Right

5. Inlet 6. Outlet

Page 12: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 12EuroEcoDC Workshop - Karlsruhe

Check Imbalance of Temperature

Unexpected imbalance !

Page 13: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 13EuroEcoDC Workshop - Karlsruhe

Check Imbalance of Temperature

• Analysis: – Failure of one fan at right side ! – Imbalance was higher when load was

placed on right side instead of left side

• Metric recalculated assuming CPU temperature of the node with failed fan as average of other nodes with similar load

Page 14: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 14EuroEcoDC Workshop - Karlsruhe

Check Imbalance of Temperature

Balanced!

“Inlet” configuration: temperature of loaded nodes affects temperature of

idle nodes

Page 15: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 15EuroEcoDC Workshop - Karlsruhe

Further steps: other metrics to test

• Idea:heat-aware + useful work + energy

• Other metrics that will be deeply analysed:1. Relation Imbalance of temperature vs

Temperature or Heat-Dissipated2. Productivity (Useful work / Energy)3. PUE Scalability4. FVER

Page 16: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 16EuroEcoDC Workshop - Karlsruhe

Further steps: other metrics to test

• PUE Scalability

Pow

er T

otal

Power IT

mPUE=3.0

mPUE=2.0

mPUE=1.5

mPUE=1.0

Source: The Green Grid. WP#49

Page 17: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 17EuroEcoDC Workshop - Karlsruhe

Further steps: other metrics to test

• PUE Scalability

0

500

1000

1500

2000

2500

3000

3500

4000

4500

0 200 400 600 800 1000 1200 1400 1600

Power_total(i) (kW)

Ideal ScalabilityPUE=1.0

Goal ScalabilityPUE=1.67

ProportionalScalabilitymeanPUE=2.5

Actual Scalability

Source: The Green Grid. WP#49

Page 18: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 18

Further steps: other metrics to test• FVER – Fixed to variable energy ratio• Source: BSC• How much energy produces useful work

and how much could be removed• E_fixed energy when useful work = 0• During flat operation DC can consume up to

80 % of peak power!

1st Review, 30.10.2012, Brussels

𝐹𝑉𝐸𝑅𝐷𝐶=1+𝐸𝐷𝐶 , 𝑓𝑖𝑥𝑒𝑑 [ h𝑊 ]𝐸𝐷𝐶 ,𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 [ h𝑊 ]

=[∅ ]𝐹𝑉𝐸𝑅𝐼𝑇=1+𝐸 𝐼𝑇 , 𝑓𝑖𝑥𝑒𝑑[ h𝑊 ]𝐸 𝐼𝑇 ,𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 [ h𝑊 ]

=[∅ ]

Page 19: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 19EuroEcoDC Workshop - Karlsruhe

Conclusions • Imbalance of temperature permits to detect

failure of IT equipment.• Complementarity between Imbalance of

temperatures and Node-Group-Cooling-Index• Analysis of several metrics together:

– Imbalance of Temperature– Power Usage (Power/Max Power rated)– Productivity (Useful work/Energy)– FVER– PUE Scalability

will allow improve aware about cooling requirements and the possibility of reducing it.

Page 20: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013 20

Conclusions• Relation between – power, – cooling requirements, – resource-usage, and – workload management

will be identified to disclose the appropriate strategies to improve the energy efficiency

• First results on tests of the first prototype have been collected. More experiments will be carried out to validate the proposed metrics

EuroEcoDC Workshop - Karlsruhe

Page 21: 30.09.2013 Energy and heat-aware metrics for data centers Jaume Salom, Laura Sisó IREC - Catalonia Institute for Energy Research Ariel Oleksiak, Mateusz

30.09.2013

Questions?Comments?

EuroEcoDC Workshop - Karlsruhe 21