14
Attacking the Power-Wall by Using Near-threshold Cores Liang Wang [email protected]

Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Embed Size (px)

DESCRIPTION

Near-threshold Cores (NVt. Cores) Pros – Low power per-core. – More cores per-chip. Limitations – Low per-core frequency, reducing throughput gains from parallelization. – Variations, harmful for performance and functionality. Will NVt. cores be a viable solution to push down the power-wall? 3Liang Wang, ECE6332 Final

Citation preview

Page 1: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Attacking the Power-Wall by Using Near-threshold Cores

Liang [email protected]

Page 2: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Liang Wang, ECE6332 Final 2

Power Wall• The end of Classical Scaling.– Vdd: almost constant– Power density: roughly increase in exponential– Utilization: roughly decrease in exponential

• We can fabricate more cores than we can power up

* From Venkatesh, et. al. ASPLOS’10

Dark Silicon

Page 3: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Liang Wang, ECE6332 Final 3

Near-threshold Cores (NVt. Cores)

• Pros– Low power per-core.– More cores per-chip.

• Limitations– Low per-core frequency, reducing throughput gains

from parallelization.– Variations, harmful for performance and functionality.

Will NVt. cores be a viable solution to push down the power-wall?

Page 4: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Liang Wang, ECE6332 Final 4

Outline

• Performance Model• Analyses and Results • Conclusion

Page 5: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Liang Wang, ECE6332 Final 5

System Modeling

Core

Area: APower: P

Symmetric Multi-core System

),)(

min(aA

vpPn Number of

active cores

)( maxVfSserial

)(VfS parallel parallelserial SnS

speedup

11

Amdahl’s LawApplication with

parallel ration of

A Single corev

Area: aPower: p(v)

Freq: f(v)

Dynamic Power

Static Power

Frequency

2vfreq v10

Fitted to circuit sim.

Page 6: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Liang Wang, ECE6332 Final 6

Simulation Setup• Circuit

– A single inverter– Ripple carry Adder (32bits, 16bits, 8bits, and 4bits)

• Technology Library– A modified version of Predictive Technology Model (PTM)

• Technology Nodes– 45nm, 32nm, 22nm, 16nm

• Process Variants– HKMGS: High-performance High-K Metal Gate and Stress effect.– LP: Low-power process

• CAD Tools– RC Compiler– Spectre driven by Ocean

Page 7: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Liang Wang, ECE6332 Final 7

Voltage-Frequency Scaling

~8x

~400x

~15x

~103x

LP has much larger frequency drop-down comparing to HP withthe same change in vdd

16nm has larger frequency drop-down comparing to 45nmWith the same change in vdd

Page 8: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Liang Wang, ECE6332 Final 8

Design space exploration (Area)45nm, HKMGS, IO cores, 100w, =0.99

saturating

Peak is cappedby total area

2x Peak from200 to 6.4K

Page 9: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Liang Wang, ECE6332 Final 9

Cross-technology study500mm2

80W

400mm2

100W

Page 10: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Liang Wang, ECE6332 Final 10

Compare to Dark Silicon

• NVt. cores alleviate the issue of low utilization.• NVt. cores has better performance. (up to 2x)

500mm2

80WHKMGS

Available cores on-chip

Page 11: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Liang Wang, ECE6332 Final 11

Variation

• NVt. cores are very sensitive to variations– Functionality. (ratioed circuits)– Performance. (focused in this project)

• Monte-Carlo simulation– Performed on every VDD setups– 100 iterations per VDD– Process and mismatch

Page 12: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Liang Wang, ECE6332 Final 12

Voltage-Frequency Scaling Revisited

• HKMGS– Up to 5x slow down

• LP– Up to 10x slow down

• HKMGS– Up to 10x slow down

• LP– Up to 100x slow down

Page 13: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Liang Wang, ECE6332 Final 13

Impact of Variation400mm2, 100W, IO

Lower Utilization

Worse Perf.

Flatten Vdd

Page 14: Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Liang Wang, ECE6332 Final 14

Conclusion

• In terms of performance– Simple core (IO) is better.– HP process (HKMGS) is better.

• Lowering VDD reduces dark silicon, improves throughput.

• Vulnerable to process variation.