28
1 Signature Buffer: Signature Buffer: Bridging Performance Gap Bridging Performance Gap between Registers and between Registers and Caches Caches Lu Peng, Jih-Kwon Peir, Kon Lu Peng, Jih-Kwon Peir, Kon rad Lai rad Lai

Signature Buffer: Bridging Performance Gap between Registers and Caches

  • Upload
    bianca

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Signature Buffer: Bridging Performance Gap between Registers and Caches. Lu Peng, Jih-Kwon Peir, Konrad Lai. Introduction. Two types of storage Registers Fast and small Supply data for operations Memory Large and slow Cache for recently used data - PowerPoint PPT Presentation

Citation preview

Page 1: Signature Buffer:  Bridging Performance Gap between Registers and Caches

11

Signature Buffer: Signature Buffer: Bridging Performance Gap Bridging Performance Gap between Registers and between Registers and CachesCaches

Lu Peng, Jih-Kwon Peir, Konrad LaiLu Peng, Jih-Kwon Peir, Konrad Lai

Page 2: Signature Buffer:  Bridging Performance Gap between Registers and Caches

22

IntroductionIntroduction

Two types of storageTwo types of storage– RegistersRegisters

Fast and smallFast and small Supply data for operationsSupply data for operations

– MemoryMemory Large and slowLarge and slow Cache for recently used dataCache for recently used data

Most RISC only operates on data from registersMost RISC only operates on data from registers

Data communication pathData communication path– Producer -> store -> load -> consumerProducer -> store -> load -> consumer

Page 3: Signature Buffer:  Bridging Performance Gap between Registers and Caches

33

IntroductionIntroduction

Future processors with 35nm Future processors with 35nm technologytechnology– 10 GHz clock10 GHz clock– 64 KB L1 cache64 KB L1 cache– 3-7 cycles L1 cache access time 3-7 cycles L1 cache access time – IPC degrades by 3.5% per additional IPC degrades by 3.5% per additional

cycle on L1 cache access timecycle on L1 cache access time

Page 4: Signature Buffer:  Bridging Performance Gap between Registers and Caches

44

Signature BufferSignature Buffer

Zero-cycle loadZero-cycle load– ““The load and its dependent instructions can be fetched, The load and its dependent instructions can be fetched,

dispatched and executed at the same time”dispatched and executed at the same time”

Avoid address calculationAvoid address calculation– Each load and store uses a signature for accessing the Each load and store uses a signature for accessing the

storagestorage

The signature buffer can be accessed in early pipeline The signature buffer can be accessed in early pipeline stagesstages

A signature consists of,A signature consists of,– Color of the base registerColor of the base register– Displacement valueDisplacement value

Page 5: Signature Buffer:  Bridging Performance Gap between Registers and Caches

55

OutlineOutline

MotivationMotivation

ImplementationImplementation

Performance evaluationPerformance evaluation

Page 6: Signature Buffer:  Bridging Performance Gap between Registers and Caches

66

Motivation – Motivation – Memory Reference Memory Reference CorrelationsCorrelations Signature correlationsSignature correlations

– Store-load and load-load can be Store-load and load-load can be correlated directly by the signaturecorrelated directly by the signature

Signature reference localitySignature reference locality– Nearby memory references often Nearby memory references often

differ by small displacement value differ by small displacement value with the same base registerwith the same base register

Page 7: Signature Buffer:  Bridging Performance Gap between Registers and Caches

77

Example 1Example 1

Source and Assembly Codes of Function copy_disjunct from Parser

Signature correlations

Signature reference locality

Page 8: Signature Buffer:  Bridging Performance Gap between Registers and Caches

88

Example 2Example 2

Source and Assembly Codes of Function bsW from Bzip

Page 9: Signature Buffer:  Bridging Performance Gap between Registers and Caches

99

Signature BufferSignature Buffer

Page 10: Signature Buffer:  Bridging Performance Gap between Registers and Caches

1010

Signature BufferSignature Buffer

0123

32

Initial State

Page 11: Signature Buffer:  Bridging Performance Gap between Registers and Caches

1111

Signature BufferSignature Buffer

01

2 -> 323

32 -> 33

1 100

1 -- 100

Page 12: Signature Buffer:  Bridging Performance Gap between Registers and Caches

1212

Data AlignmentData Alignment

Page 13: Signature Buffer:  Bridging Performance Gap between Registers and Caches

1313

Data AlignmentData Alignment

SB SB tagtag

L1 tagL1 tag ValidValid BoundBound

SB Directory SB Data Array

TagTag

L1 Tag Array L1 Data Array

Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000

Page 14: Signature Buffer:  Bridging Performance Gap between Registers and Caches

1414

Data AlignmentData Alignment

SB SB tagtag

L1 tagL1 tag ValidValid BoundBound

AA CC I-VI-V 101101

SB Directory SB Data Array

000011

TagTag

CC

DD

L1 Tag Array

101000

L1 Data Array

Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000

SB MISS!

Page 15: Signature Buffer:  Bridging Performance Gap between Registers and Caches

1515

Data AlignmentData Alignment

SB SB tagtag

L1 tagL1 tag ValidValid BoundBound

AA CC V-VV-V 101101

SB Directory SB Data Array

101011

000011

TagTag

CC

DD

L1 Tag Array

101000

000000

L1 Data Array

Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000

SB MISS!

Page 16: Signature Buffer:  Bridging Performance Gap between Registers and Caches

1616

Data AlignmentData Alignment

SB SB tagtag

L1 tagL1 tag ValidValid BoundBound

AA CC V-VV-V 101101

BB DD I-VI-V 101101

SB Directory SB Data Array

101011

000011

010100

TagTag

CC

DD

L1 Tag Array

101000

101011

000000

L1 Data Array

Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000

SB MISS!

Page 17: Signature Buffer:  Bridging Performance Gap between Registers and Caches

1717

Data AlignmentData Alignment

SB SB tagtag

L1 tagL1 tag ValidValid BoundBound

AA CC I-VI-V 101101

BB DD I-II-I 101101

SB Directory SB Data Array

101011

000011

010100

TagTag

CC

DD

L1 Tag Array

101000

101011

000000

L1 Data Array

Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000

SB MISS!Invalidate high A, low B

Page 18: Signature Buffer:  Bridging Performance Gap between Registers and Caches

1818

MicroarchitectureMicroarchitecture

Bypass I Bypass I – SB hit or an early store-load forwardingSB hit or an early store-load forwarding

Bypass IIBypass II– Normal store-load forwardingNormal store-load forwarding

Page 19: Signature Buffer:  Bridging Performance Gap between Registers and Caches

1919

MicroarchitectureMicroarchitecture

Page 20: Signature Buffer:  Bridging Performance Gap between Registers and Caches

2020

Performance Performance EvaluationEvaluation

Page 21: Signature Buffer:  Bridging Performance Gap between Registers and Caches

2121

Performance Performance Evaluation – Evaluation – IPCIPC

SB – nospec13% speedup

SB – perfect14% speedup

Page 22: Signature Buffer:  Bridging Performance Gap between Registers and Caches

2222

Performance Performance Evaluation – Evaluation – Load DistributionLoad Distribution

Normal S-L Forw. & L1 access reduced t0 30%, 70% of loads benefit from SBSB With perfect memory dependence predictor obtains 23% zero-cycle load

Page 23: Signature Buffer:  Bridging Performance Gap between Registers and Caches

2323

Performance Performance Evaluation – Evaluation – SB Hit RatioSB Hit Ratio

Average SB hit rate is about 51%

Page 24: Signature Buffer:  Bridging Performance Gap between Registers and Caches

2424

Performance Evaluation – Performance Evaluation –

Comparison with L0 Comparison with L0 CacheCache

Performance benefit of SB goes up with L1 latencyand always above having a L0 cache

Page 25: Signature Buffer:  Bridging Performance Gap between Registers and Caches

2525

Performance Evaluation – Performance Evaluation –

Comparison with L0 Comparison with L0 CacheCache

Larger L0 => higher hit rate

SB is less sensitiveto size.

Page 26: Signature Buffer:  Bridging Performance Gap between Registers and Caches

2626

AdvantagesAdvantages

Non-speculativeNon-speculative– Data obtained from the SB without intervening stores is Data obtained from the SB without intervening stores is

always correctalways correct

All loads can access the data from the SB without any All loads can access the data from the SB without any restriction on the type of the loads or base registers.restriction on the type of the loads or base registers.

Loads through the SB can bypass the address generation Loads through the SB can bypass the address generation and cache access completely.and cache access completely.

Store/Load correlation is established from the instruction Store/Load correlation is established from the instruction encoding bits to simplify hardware requirement.encoding bits to simplify hardware requirement.

SB uses line-based granularity to capture spatial locality.SB uses line-based granularity to capture spatial locality.

Page 27: Signature Buffer:  Bridging Performance Gap between Registers and Caches

2727

Questions?

Page 28: Signature Buffer:  Bridging Performance Gap between Registers and Caches

2828

Loads – SB SpecificLoads – SB Specific

Early S-L forwardingEarly S-L forwarding– A load has identical signature with an early store in the LSQ A load has identical signature with an early store in the LSQ

with no intervening store in between. (zero-cycle load & SB with no intervening store in between. (zero-cycle load & SB hit)hit)

Early SB accessEarly SB access– SB is accessed after a load is fetched and decoded (zero-SB is accessed after a load is fetched and decoded (zero-

cycle load & SB hit)cycle load & SB hit)

Delayed SB accessDelayed SB access– SB is accessed after memory dependence resolutions SB is accessed after memory dependence resolutions

because of intervening stores (SB hit)because of intervening stores (SB hit)

Non-Signature ForwardingNon-Signature Forwarding– Consecutive SB misses to the same SB line gets forwarded Consecutive SB misses to the same SB line gets forwarded

data from previous misses (SB miss)data from previous misses (SB miss)