Upload
bianca
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Signature Buffer: Bridging Performance Gap between Registers and Caches. Lu Peng, Jih-Kwon Peir, Konrad Lai. Introduction. Two types of storage Registers Fast and small Supply data for operations Memory Large and slow Cache for recently used data - PowerPoint PPT Presentation
Citation preview
11
Signature Buffer: Signature Buffer: Bridging Performance Gap Bridging Performance Gap between Registers and between Registers and CachesCaches
Lu Peng, Jih-Kwon Peir, Konrad LaiLu Peng, Jih-Kwon Peir, Konrad Lai
22
IntroductionIntroduction
Two types of storageTwo types of storage– RegistersRegisters
Fast and smallFast and small Supply data for operationsSupply data for operations
– MemoryMemory Large and slowLarge and slow Cache for recently used dataCache for recently used data
Most RISC only operates on data from registersMost RISC only operates on data from registers
Data communication pathData communication path– Producer -> store -> load -> consumerProducer -> store -> load -> consumer
33
IntroductionIntroduction
Future processors with 35nm Future processors with 35nm technologytechnology– 10 GHz clock10 GHz clock– 64 KB L1 cache64 KB L1 cache– 3-7 cycles L1 cache access time 3-7 cycles L1 cache access time – IPC degrades by 3.5% per additional IPC degrades by 3.5% per additional
cycle on L1 cache access timecycle on L1 cache access time
44
Signature BufferSignature Buffer
Zero-cycle loadZero-cycle load– ““The load and its dependent instructions can be fetched, The load and its dependent instructions can be fetched,
dispatched and executed at the same time”dispatched and executed at the same time”
Avoid address calculationAvoid address calculation– Each load and store uses a signature for accessing the Each load and store uses a signature for accessing the
storagestorage
The signature buffer can be accessed in early pipeline The signature buffer can be accessed in early pipeline stagesstages
A signature consists of,A signature consists of,– Color of the base registerColor of the base register– Displacement valueDisplacement value
55
OutlineOutline
MotivationMotivation
ImplementationImplementation
Performance evaluationPerformance evaluation
66
Motivation – Motivation – Memory Reference Memory Reference CorrelationsCorrelations Signature correlationsSignature correlations
– Store-load and load-load can be Store-load and load-load can be correlated directly by the signaturecorrelated directly by the signature
Signature reference localitySignature reference locality– Nearby memory references often Nearby memory references often
differ by small displacement value differ by small displacement value with the same base registerwith the same base register
77
Example 1Example 1
Source and Assembly Codes of Function copy_disjunct from Parser
Signature correlations
Signature reference locality
88
Example 2Example 2
Source and Assembly Codes of Function bsW from Bzip
99
Signature BufferSignature Buffer
1010
Signature BufferSignature Buffer
0123
32
Initial State
1111
Signature BufferSignature Buffer
01
2 -> 323
32 -> 33
1 100
1 -- 100
1212
Data AlignmentData Alignment
1313
Data AlignmentData Alignment
SB SB tagtag
L1 tagL1 tag ValidValid BoundBound
SB Directory SB Data Array
TagTag
L1 Tag Array L1 Data Array
Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000
1414
Data AlignmentData Alignment
SB SB tagtag
L1 tagL1 tag ValidValid BoundBound
AA CC I-VI-V 101101
SB Directory SB Data Array
000011
TagTag
CC
DD
L1 Tag Array
101000
L1 Data Array
Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000
SB MISS!
1515
Data AlignmentData Alignment
SB SB tagtag
L1 tagL1 tag ValidValid BoundBound
AA CC V-VV-V 101101
SB Directory SB Data Array
101011
000011
TagTag
CC
DD
L1 Tag Array
101000
000000
L1 Data Array
Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000
SB MISS!
1616
Data AlignmentData Alignment
SB SB tagtag
L1 tagL1 tag ValidValid BoundBound
AA CC V-VV-V 101101
BB DD I-VI-V 101101
SB Directory SB Data Array
101011
000011
010100
TagTag
CC
DD
L1 Tag Array
101000
101011
000000
L1 Data Array
Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000
SB MISS!
1717
Data AlignmentData Alignment
SB SB tagtag
L1 tagL1 tag ValidValid BoundBound
AA CC I-VI-V 101101
BB DD I-II-I 101101
SB Directory SB Data Array
101011
000011
010100
TagTag
CC
DD
L1 Tag Array
101000
101011
000000
L1 Data Array
Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000
SB MISS!Invalidate high A, low B
1818
MicroarchitectureMicroarchitecture
Bypass I Bypass I – SB hit or an early store-load forwardingSB hit or an early store-load forwarding
Bypass IIBypass II– Normal store-load forwardingNormal store-load forwarding
1919
MicroarchitectureMicroarchitecture
2020
Performance Performance EvaluationEvaluation
2121
Performance Performance Evaluation – Evaluation – IPCIPC
SB – nospec13% speedup
SB – perfect14% speedup
2222
Performance Performance Evaluation – Evaluation – Load DistributionLoad Distribution
Normal S-L Forw. & L1 access reduced t0 30%, 70% of loads benefit from SBSB With perfect memory dependence predictor obtains 23% zero-cycle load
2323
Performance Performance Evaluation – Evaluation – SB Hit RatioSB Hit Ratio
Average SB hit rate is about 51%
2424
Performance Evaluation – Performance Evaluation –
Comparison with L0 Comparison with L0 CacheCache
Performance benefit of SB goes up with L1 latencyand always above having a L0 cache
2525
Performance Evaluation – Performance Evaluation –
Comparison with L0 Comparison with L0 CacheCache
Larger L0 => higher hit rate
SB is less sensitiveto size.
2626
AdvantagesAdvantages
Non-speculativeNon-speculative– Data obtained from the SB without intervening stores is Data obtained from the SB without intervening stores is
always correctalways correct
All loads can access the data from the SB without any All loads can access the data from the SB without any restriction on the type of the loads or base registers.restriction on the type of the loads or base registers.
Loads through the SB can bypass the address generation Loads through the SB can bypass the address generation and cache access completely.and cache access completely.
Store/Load correlation is established from the instruction Store/Load correlation is established from the instruction encoding bits to simplify hardware requirement.encoding bits to simplify hardware requirement.
SB uses line-based granularity to capture spatial locality.SB uses line-based granularity to capture spatial locality.
2727
Questions?
2828
Loads – SB SpecificLoads – SB Specific
Early S-L forwardingEarly S-L forwarding– A load has identical signature with an early store in the LSQ A load has identical signature with an early store in the LSQ
with no intervening store in between. (zero-cycle load & SB with no intervening store in between. (zero-cycle load & SB hit)hit)
Early SB accessEarly SB access– SB is accessed after a load is fetched and decoded (zero-SB is accessed after a load is fetched and decoded (zero-
cycle load & SB hit)cycle load & SB hit)
Delayed SB accessDelayed SB access– SB is accessed after memory dependence resolutions SB is accessed after memory dependence resolutions
because of intervening stores (SB hit)because of intervening stores (SB hit)
Non-Signature ForwardingNon-Signature Forwarding– Consecutive SB misses to the same SB line gets forwarded Consecutive SB misses to the same SB line gets forwarded
data from previous misses (SB miss)data from previous misses (SB miss)