25
Respin: Rethinking Near- Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu Computer Architecture Research Lab h"p://arch.cse.ohio-state.edu

Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Respin: Rethinking Near-Threshold Multiprocessor Design

with Non-Volatile Memory

XiangPan,AnysBacha,andRaduTeodorescuComputerArchitectureResearchLab

h"p://arch.cse.ohio-state.edu

Page 2: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Universal Demand for Low Power

2

•  Mobility•  Ba"erylife

•  Performance•  Powerconstraints

•  Energycost•  Environment

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 3: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Near-Threshold Operation

3

Voltage(V)

NTVdd

NominalVdd

Vth

Frequency

10.80.40.30.20

~1/10~1/100

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 4: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

•  PerformancedegradaNon

Challenges in Near-Threshold

4

0

0.002

0.004

0.006

0.008

0.01

0.012

0 0.5 1 1.5 2

Prob

abili

ty d

istri

butio

n

Relative frequency

900mV, Vth σ/µ= 9%900mV, Vth σ/µ=12%400mV, Vth σ/µ= 9%400mV, Vth σ/µ=12%

00.10.20.30.40.50.60.70.80.91

NominalVdd Near-ThresholdVdd

Frac=on

ofTotalChipPo

wer

DynamicandLeakagePowerBreakdownfora64-CoreCMP

dynamic-core

dynamic-cache

leakage-core

leakage-cache

•  AmplifiedprocessvariaNon

•  Leakagepowerdominates

•  TheiniNalideaofRespin–BuildcachesinNT-CMP

with“leakage-free”non-volaNlememoriesto

reducepowerconsumpNon

~39%

~73%

~35%

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

TotalLeakage

TotalLeakage

CacheLeakage

Page 5: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Outline

• BasicsofNVM• RespinArchitecture• MethodologyandEvaluaNon• Conclusion

5Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 6: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Non-Volatile Memory Basics • Non-VolaNlity–ResistanceasdatarepresentaNon(e.g.PCRAM,STT-RAM,ReRAM,etc.)

• Near-ZeroLeakagePower–Goodfitforfuturepower-constrainedcompuNng

• HighDensity–Greatdesigncandidateinthebigdataera

• GoodPerformance–Feasibleforon-chipstoragereplacement

6Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 7: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

STT-RAM •  UniquefeaturesofSTT-RAM:fastreadspeed,lowreadenergy,unlimitedwriteendurance,andgoodcompaNbilitywithCMOStechnology

•  Shortcomings:longwritelatencyandhighwriteenergy

7

AnN-parallelstate:high-resistance,represenNnga“1”

Parallelstate:low-resistance,represenNnga“0”

MTJ(MagneNcTunnelJuncNon)

STT-RAMCell(1T-1MTJ)

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 8: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

STT-RAM is Good Fit for NT-CMP

8

NT-CMP STT-RAM

Highleakagepower Near-zeroleakagepower

LowoperaNngfrequency

Longlatencywrites

Largenumbersofcoresrequiringhigh

cachecapacity

Highdensity(~4xdenserthanSRAM)

UnreliablefuncNonalunits

Robustfromsoh-errors

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 9: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Outline

• BasicsofNVM• RespinArchitecture• MethodologyandEvaluaNon• Conclusion

9Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 10: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Respin Architecture

10

Cluster 0

Core 0 Core 1 Core 2 Core 7Core 6Core 5Core 4

Core 8 Core 9 Core 10 Core11 Core 15Core 14Core 13Core 12

Core 3

Cluster 2

Core 0 Core 1 Core 2 Core 7Core 6Core 5Core 4

Core 8 Core 9 Core 10 Core11 Core 15Core 14Core 13Core 12

Core 3

Cluster 3

Core 0 Core 1 Core 2 Core 7Core 6Core 5Core 4

Core 8 Core 9 Core 10 Core11 Core 15Core 14Core 13Core 12

Core 3

Cluster 1

Core 0 Core 1 Core 2 Core 7Core 6Core 5Core 4

Core 8 Core 9 Core 10 Core11 Core 15Core 14Core 13Core 12

Core 3

L1D L1IL2 Cache L2 Cache

Core 0 Core 1 Core 2 Core 7Core 6Core 5Core 4

Core 8 Core 9 Core 10 Core11 Core 15Core 14Core 13Core 12

Core 3

NT Vdd High Vdd

(a)

L3 Cache

(b)

NT Vdd High Vdd

•  CoresoperateatNT-Vddrailwithlowfrequencies•  CachesarebuiltwithSTT-RAMandoperateathigh-Vddrailmakingreadspeed

extremelyfast

•  Clustered-CMPwithfastSTT-RAMreadenableswithin-clustersharedL1cache

design,removingcoherencecostsRespin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory

Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 11: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Shared Cache Controller

11

Cache CLK (0.4ns)

Core0 (1.6ns) Core1 (2.4ns) Core2 (1.6ns) Core3 (1.6ns) Core4 (2.0ns)

Core0 (1.6ns) Core1 (2.4ns) Core2 (1.6ns) Core3 (1.6ns) Core4 (2.0ns)

cycle 0 cycle 1 cycle 2 cycle 3 cycle 4service C0

(a)

(b)

service C2 service C4service C3half-miss C3

cannot service C3 in time

service C1

Cache CLK (0.4ns)

cycle 0 cycle 1 cycle 2 cycle 3 cycle 4

half-miss C3

cycle 5 cycle 6

Level-shifting overhead

Level-shifting overhead

Request serviced

cycle 5 cycle 6

00011 01111

00011 00111

00001 00001

00001 00001 half-miss

00111 00011 00001

00011

00011

service C0 service C2 service C4service C3 service C1

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 12: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Dynamic Core Consolidation

12

•  HighprocessvariaNonandleakageinNT-CMPleadtofastcoresmore

energy-efficientthanslowones

•  Dynamicallyconsolidatethreads

ontomoreefficientcoreswith

greedysearchatrunNmecanfurthersaveenergy

•  Mechanismimplementedin

systemfirmware,energy-per-

instrucNonusedasgreedyselecNonmetric,andinstrucNon

countusedasevaluaNoninterval

System Firmware (ACPI)

Core Remapper

Filter Power Control

VC 0VC 2VC 3VC 4VC 13VC 14VC 15 . . .

Virtual Cores

VC 1

Retired Inst.Energy

01…

1514…

WeightPhy. ID

01…

32…

Phy. IDVirt. IDEnergy Table ID Map

OFFOFFVC 3VC 15VC 13VC 14 . . .

Physical Cores

C 0C 1C 2C 3C 4C 13C 14C 15 . . .

VC 1VC 2

VC 0VC 4

OFFOFF…

Status

Virtual Core Monitor

OS

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 13: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Greedy Selection

13

•  AgreedyapproachisusedtosearchforopNmalsystemconfiguraNonsat

applicaNonrunNme

•  Energy-Per-InstrucNon(EPI)asevaluaNonmetric

•  InstrucNoncountasevaluaNoninterval

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 14: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Outline

• BasicsofNVM• RespinArchitecture• MethodologyandEvalua=on• Conclusion

14Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 15: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Methodology

15

Level Size(mm²) BlockSize Associa=vity Read/WritePorts

L1I(Private/SharedwithinCluster) 16KB(Private)/256KB(Sharedwithin

Cluster)32B

2-way

1/1

L1D(Private/SharedwithinCluster)

4-way

L2(SharedwithinCluster)

8MB(Small)/16MB(Medium)/32MB(Large)

64B 8-way

L3(SharedwithinChip)

24MB(Small)/48MB(Medium)/96MB(Large)

128B 16-way

Table1.SummaryofCacheParameters.

VddRail Area(mm²)

Read/WriteLatency(ns)

Read/WriteEnergy(pJ)

LeakagePower(mW)

SRAM(16KB×16) Low(0.65V)0.9176

1.337 2.578 573

SRAM(256KB)High(1.0V)

0.5336 42.41 881

STT-RAM(256KB) 0.2451 0.3774/5.208 29.32/209.3 114

Table2.ComparisonofSRAMvs.STT-RAMTechnologyParameters.Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory

Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 16: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Methodology

16

CMPArchitecture

Cores 64out-of-order

Fetch/Issue/CommitWidth 2/2/2

RegisterFileSize 76int,56fp

InstrucNonWindowSize 56int,24fp

ReorderBufferSize 80entries

Load/StoreQueueSize 38entries

NoCInterconnect 2DTorus

CoherenceProtocol MESI

ConsistencyModel ReleaseConsistency

Technology 22nm

NT-Vdd 0.4V(Core),0.65V(Cache)

Nominal-Vdd 1.0V

CoreFrequencyRange 375MHz–725MHz

MedianCoreFrequency 500MHz

VariaNonParameters

Vthstd.dev./mean(σ/μ) 12%(Chip),10%(Cluster)

Table3.SummaryofExperimentalParameters.

•  SimulaNonFramework:

•  SESCforarchitecturalsimulaNon

•  CACTI,McPAT,andNVSimfor

latency,power,energy,andareasimulaNon

•  Benchmarks:

•  SPLASH2andPARSEC

•  MainEvaluatedConfiguraNon:

•  64-coreCMPwithfour16-coreclusters

•  MediumsizeL2andL3caches

•  0.4nssharedL1cachereadlatency

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 17: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Power and Performance

17

SmallCache MediumCacheLargeCache

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

Normalized

TotalChipPo

wer

PowerReduc=onofProposedDesignwithThreeL2/L3CacheSizes

leakage dynamic

~3%~14%

~23%

•  Respinachieved14%powerreducNonand11%performanceimprovementwithmediumsizedcache

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

1.0

0.22

0.90 0.89

0

0.2

0.4

0.6

0.8

1

GEOMEAN

Normalized

Execu=on

Tim

e

Rela=veExecu=onTimeforMediumCacheSize

PR-SRAM-NT HP-SRAM-CMP SH-SRAM-Nom SH-STT

Page 18: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Energy Consumption

18

•  Formediumsizedcache,Respinachieved22%energysavingswiththebasicshared

STT-RAMcachedesignplusaddiNonal10%withcoreconsolidaNonenabled

0.87

0.78

0.69

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

SmallCache MediumCache LargeCache

Normalized

Ene

rgyCo

nsum

p=on

EnergyConsump=onforThreeL2/L3CacheSizes

PR-SRAM-NT SH-SRAM-Nom SH-STT

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

1.0

1.39

1.12

0.78

0.680.65

0.76

0.99

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

GEOMEAN

Normalized

Ene

rgyCo

nsum

p=on

EnergyConsump=onforDifferentCoreConsolida=onConfigura=ons

PR-SRAM-NT HP-SRAM-CMP SH-SRAM-Nom SH-STTSH-STT-CC SH-STT-CC-Oracle PR-STT-CC SH-STT-CC-OS

Page 19: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Core Consolidation Analysis

19

•  Inmostcasesourgreedyalgorithmmatcheswellwiththeoraclewhileinveryfewcasessub-

opNmalselecNonbecomesthebarriertoslowdownthepaceofourgreedymechanism

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

lu(SPLASH2):greedyachieved29%energysavingswhileoracle

achieved38%

0

2

4

6

8

10

12

14

16

0 50 100 150 200 250 300 350 400 450 500

Num

ber

of

OF

F C

ore

s

Number of Million Instructions

SH-STT-CC SH-STT-CC-Oracle

radix(SPLASH2):greedyachieved48%energysavingswhileoracleachieved50%

0

2

4

6

8

10

12

14

16

0 20 40 60 80 100 120 140 160

Num

ber

of

OF

F C

ore

s

Number of Million Instructions

SH-STT-CC SH-STT-CC-Oracle

Sub-op=mal

Page 20: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Shared Cache Access Load

20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 >=4

Percen

tageofTotalCache

Cycles

NumofRequestsArrivingatDL1Cache

SharedDL1CacheU=liza=onRateinOneNT-CMPCluster

blackscholes

cholesky

radix

raytrace

streamcluster

AVERAGE0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 >=3

Percen

tageofD

L1ReadHitReq

uests

NumofCoreCyclesServiced

Frac=onofReadHitRequestsServicedbytheSharedDL1in1,2,ormorecycles

blackscholes

cholesky

radix

raytrace

streamcluster

AVERAGE

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

•  Morethan95%oftheincomingrequestscanbeservicedinone

processorcorecycle

Page 21: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Sensitivity Study on Cluster Size

21

ClusterSize(#cores) SharedL1(I/D)Size(KB) PerformanceGain(%)

4 64 4.82

8 128 6.29

16 256 10.81

32 512 2.50

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

•  16-coreclusterprovidesthebesttrade-offbetweendatasharingamong

coresandsharedcacheaccessload,giving~11%performancegain

Page 22: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Number of Active Cores in Dynamic Core Consolidation

22Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

barnescholesky

fft lu oceanradiosity

radixraytrace

water-nsquared

blackscholes

bodytrackstreamcluster

swaptions

AVG

Aver

age

Num

ber

of

Act

ive

Core

s

SH-STT-CC SH-STT-CC-Oracle

•  StrongphasedbehaviorcanbeobservedacrossallapplicaNons,making

ourdynamiccoreconsolidaNonmechanismveryuseful

Page 23: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Outline

• BasicsofNVM• RespinArchitecture• MethodologyandEvaluaNon• Conclusion

23Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 24: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Conclusion •  Thefirstworktoexploretheuseofnon-volaNlecachesinnear-thresholdchipmulN-processors

•  AnovelarchitecturedesignedtoenhanceNT-CMPperformanceandreduce

energyconsumpNonbysharingL1cachesandimplemenNngdynamiccore

consolidaNonmechanism

•  AchievedenergyreducNonby33%andimprovedperformanceby11%

24Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Page 25: Respin: Rethinking Near- Threshold Multiprocessor Design ... · Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu

Questions?

Thankyou!

25Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory Xiang Pan, Anys Bacha, and Radu Teodorescu