If you can't read please download the document
Upload
amos
View
25
Download
0
Embed Size (px)
DESCRIPTION
ULC: An Unified Placement and Replacement Protocol in Multi-level Storage Systems. Song Jiang and Xiaodong Zhang College of William and Mary. Multi-Level Buffer Caching in Distributed Systems. client. Front-tier server. end-tier server. network. client. disk array. . 50%. 10%. 10%. - PowerPoint PPT Presentation
Citation preview
ULC: An Unified Placement and Replacement Protocol in Multi-level Storage SystemsSong Jiang and Xiaodong Zhang
College of William and Mary
Multi-Level Buffer Caching in Distributed SystemsclientclientnetworkFront-tier serverend-tier serverdisk array
Challenges to Improve Hierarchy Performance LRUL1L2L3L4LRULRULRU (1) Can the hit rate of hierarchical caches achieve the hit rate of a single first level cache with its size equal to the aggregate size of the hierarchy? (2) Can we make caches close to clients contribute more to the hit rate?
Reason I: Weakened Locality at Low Level Caches Low level caches hold the misses from their upper level buffer caches, and the hits have high latency. The requests with strong locality have been filtered by the high level buffer caches close to clients.
An Existing Solution: Re-designing Low Level Cache Replacement Algorithms
To overcome weak locality, MQ is a frequency-based replacement;Once a block is accessed, it is promoted to a higher queue. Periodically, blocks in each queue are checked and low frequency blocks are demoted to lower queues.
Multi-Queue Replacement (MQ) [USENIX01]
Drawbacks of MQ ReplacementInheriting the weakness of frequency-based algorithm not responsive to access pattern changesContaining workload sensitive parameters;Cannot fully exploit the locality knowledge inherent in applications (accurate information is in high level caches)Motivation: Locality analysis is conducted at clients, where original requests are generated.
Reason II: Undiscerning Redundancy among Levels of Buffer CachesBoth cachesServer cacheclient CacheSnapshots at every 1000 references
Another Existing Solution: Extending Existing Replacement into an Unified Replacement For example: Unified LRU (uniLRU) [USENIX02]L1 LRU stackL2 LRU stack10ClientServer
Drawbacks of Unified LRU.High level caches are not well utilized Large demotion overhead;
Our Approach: Unified Level-aware Caching (ULC) Blocks with weak locality are placed in the low level buffer caches (1) Locality is analyzed at client. (2) The analysis results are used to direct the placement of blocks in the hierarchy. Minimizing redundancy among levels of the buffer caches by unified replacement based on client information.Locality strengthCache levels
Quantifying Locality StrengthLAD-R = max (LAD, R)Next Access PositionLast Access PositionCurrent PositionUnified LRU Stack Locality strength is characterized by Next Access Distance (NAD); NAD is unknown currently; NAD is quantitatively predicted by Last Access Distance (LAD) and Recency (R).
Advantages of LAD-R over Rnot change until the next reference of the blockAccurate quantification
Multi-Level Buffer Caching Protocol ---- Unified and Level-Aware Caching (ULC)ULC running on the first level client dynamically ranks the accessed blocks according to their LAD-R values.Based on the ranking results, blocks are cached (placed) at levels L1, L2, , accordingly.Low level caches take actions such as caching or replacing according to the instructions from clients.
LAD-R Based Block Caching Exactly arranging block layout as LAD-R ranking is expensive (at least O(logn)) Efficient two-phase LAD-R Based caching (O(1)):LAD determines block placement at the time of retrieval (R = 0);R is used for block replacement after a block is cached.
LAD-R Based Placement and ReplacementL1 LRU stack31650489L2 LRU stackThe LRU position at which a block is accessed determines its placement The current LRU position determines its replacement
ULC Data Structure uniLRU Stack Y2Y3YardstickL1 L2 L3 Level Status Recency Status is determined by recency Level Status is determined by LAD The placement of a block is determined by its level status
The yardstick block is the one for replacement at the corresponding level
Two Operations in the ULC Protocol Two request messages from the client to the low level caches:
Retrieve (b, i, j) ( i j ): retrieve block b from level Li, and cache it at level Lj when it passes level Lj on its route to level L1.
Demote (b, i, j) (i < j): demote block b from level Li into level Lj.
uniLRU Stack Y2Y3L1 L2 L3 Level StatusRetrieve (11, From, To) R2Retrieve (11, 3, 2 )
uniLRU Stack Y3L1 L2 L3 Level StatusRetrieve (11, 3, 2)Demote (6, 2, 3)
ULC with Multiple Clients L1 BlockLout blockYardstick536YardstickClient 1Client 2Global_LRU at Server637736824519151773610111810181714191715L2 BlockL1L2L1L2
Performance Evaluation: Workload Traces RANDOM: spatially uniform distribution of references; (synthetic traces) ZIPF: highly skewed reference distribution; (synthetic traces) HTTPD collected on a 7-node parallel web-server. (HP) DEV1 collected in an office environment for over 15 consecutive days. (HP) TPCC1 the I/O trace of the TPC-C database benchmark. (IBM DB2)
Performance on a 3-level Structure Block size: 8KBBlock transfer time between the client and the server buffer caches: 1msBlock transfer time between the server buffer cache and its RAM cache on disk: 0.2msBlock transfer time between the disk RAM cache and the disk: 10msCache size: 100MB each
Compared with indLRU, ULC significantly increases hit ratios;Compared with uniLRU, ULC providse better hit distribution;
Chart3
19.51.670.3
19.519.5819.5
19.5519.5119.56
73.331.650.3
73.337.134.48
84.064.442.12
76.22.10.5
76.2110.745.14
77.549.945.16
6.50.280
6.58.7719.7
17.3112.9516.55
19.51.670
0.0392.495.05
50.4945.063.39
RANDOM
HTTPD
ZIPF
DEV1
TPCC1
Level 1
Level 2
Level 3
Hit Rate (%)
Hit Rate Breakdown
Sheet1
Level 1Level 2Level 3
indLRU19.51.670.3
uniLRU19.519.5819.5
ULC19.5519.5119.56
indLRU73.331.650.3
uniLRU73.337.134.48
ULC84.064.442.12
indLRU76.22.10.5
uniLRU76.2110.745.14
ULC77.549.945.16
indLRU6.50.280
uniLRU6.58.7719.7
ULC17.3112.9516.55
indLRU19.51.670
uniLRU0.0392.495.05
ULC50.4945.063.39
Sheet2
Sheet3
Chart1
0.020.00358.7890
0.1980.2344.6360.934
0.1930.2364.6360.295
0.0170.00282.770
0.06960.05281.6820.306
0.0440.0251.04980.0113
0.0220.00722.36130
0.110.0620.88570.26
0.0990.0620.80220.087
0.001010.440
0.0870.2357.2821.096
0.130.25.960.2
0011.20
0.9240.0610.271.15
0.450.0410.1420.017
RANDOM
ZIPF
HTTPD
DEV1
TPCC1
L2 Hit Time
L3 Hit Time
Miss Penalty
Demotion Cost
Average Block Access Time (ms)
Average Access Time Breakdown
Sheet1
L2 Hit TimeL3 Hit TimeMiss PenaltyDemotion Cost
indLRU0.020.00358.7890
uniLRU0.1980.2344.6360.934
ULC0.1930.2364.6360.295
indLRU0.0170.00282.770
uniLRU0.06960.05281.6820.306
ULC0.0440.0251.04980.0113
indLRU0.0220.00722.36130
uniLRU0.110.0620.88570.26
ULC0.0990.0620.80220.087
indLRU0.001010.440
uniLRU0.0870.2357.2821.096
ULC0.130.25.960.2
indLRU0011.20
uniLRU0.9240.0610.271.15
ULC0.450.0410.1420.017
Chart2
80.560.924.623.5
26.719.50.71.4
23.87.87.84.7
93.581.116.918.4
1007.51.41.3
L1-L2 Demotion (uniLRU)
L2-L3 Demotion (uniLRU)
L1-L2 Demotion (ULC)
L2-L3 Demotion (ULC)
Demotion Rate (%)
Demotion Rates at Each Boundary for the Unified Caching Schemes
Sheet2
L1-L2 Demotion (uniLRU)L2-L3 Demotion (uniLRU)L1-L2 Demotion (ULC)L2-L3 Demotion (ULC)
RANDOM80.560.924.623.5
ZIPF26.719.50.71.4
HTTPD23.87.87.84.7
DEV193.581.116.918.4
TPCC11007.51.41.3
Chart4
8.78.48.48.48.48.48.48.48.43.1
2.22.44.25.711.58.48.58.117.810.1
22.24.25.411.28.28.38.215.913.5
1.80000000077.4
17.216.911.974.20.50000
1312.614.211.260.50000
13.112.612.311.13.21.22.61.800
0.900.210.11015.38.512.40.50
75.56.23.321.40.90.90.600
73.85.93.61.31.71.62.10.800
745.73.61.71.71.20.90.40.11.8
618.54.93.8422.2211.5
39.8149.67.25.64.53.62.820.9
17.414.412.410.89.37.96.65.33.91.9
109.99.79.79.49.28.88.786.7
9.79.89.89.79.49.29.18.68.26.5
91.12.90.60.1000000
88.53.92.20.1000000
83.55.320.50.710.70.30.20.6
86.55.71.40.50.20.10.30.100
52.79.97.63.21.800000
46.86.414.84.13.100000
45.26.211.25.52.92.70.21.30.10
35.58.61.36.111.56.51.93.800.1
ZIPF
CS
GLIMPSE
RANDOM
SPRITE
MULTI
Segment 1
Segment 2
Segment 3
Segment 4
Segment 5
Segment 6
Segment 7
Segment 8
Segment 9
Segment 10
Reference Rate (%)
Sheet3
Segment 1Segment 2Segment 3Segment 4Segment 5Segment 6Segment 7Segment 8Segment 9Segment 10
ND8.78.48.48.48.48.48.48.48.43.1
NLD2.22.44.25.711.58.48.58.117.810.1
IRR-R22.24.25.411.28.28.38.215.913.5
R1.80000000077.4
ND17.216.911.974.20.50000
NLD1312.614.211.260.50000
IRR-R13.112.612.311.13.21.22.61.800
R0.900.210.11015.38.512.40.50
ND75.56.23.321.40.90.90.600
NLD73.85.93.61.31.71.62.10.800
IRR-R745.73.61.71.71.20.90.40.11.8
R618.54.93.8422.2211.5
ND39.8149.67.25.64.53.62.820.9
NLD17.414.412.410.89.37.96.65.33.91.9
IRR-R109.99.79.79.49.28.88.786.7
R9.79.89.89.79.49.29.18.68.26.5
ND91.12.90.60.1000000
NLD88.53.92.20.1000000
IRR-R83.55.320.50.710.70.30.20.6
R86.55.71.40.50.20.10.30.100
ND52.79.97.63.21.800000
NLD46.86.414.84.13.100000
IRR-R45.26.211.25.52.92.70.21.30.10
R35.58.61.36.111.56.51.93.800.1
indLRU has high miss penalty; uniLRU has high demotion cost;
Chart1
0.020.00358.7890
0.1980.2344.6360.934
0.1930.2364.6360.295
0.0170.00282.770
0.06960.05281.6820.306
0.0440.0251.04980.0113
0.0220.00722.36130
0.110.0620.88570.26
0.0990.0620.80220.087
0.001010.440
0.0870.2357.2821.096
0.130.25.960.2
0011.20
0.9240.0610.271.15
0.450.0410.1420.017
RANDOM
ZIPF
HTTPD
DEV1
TPCC1
L2 Hit Time
L3 Hit Time
Miss Penalty
Demotion Cost
Average Block Access Time (ms)
Average Access Time Breakdown
Sheet1
L2 Hit TimeL3 Hit TimeMiss PenaltyDemotion Cost
indLRU0.020.00358.7890
uniLRU0.1980.2344.6360.934
ULC0.1930.2364.6360.295
indLRU0.0170.00282.770
uniLRU0.06960.05281.6820.306
ULC0.0440.0251.04980.0113
indLRU0.0220.00722.36130
uniLRU0.110.0620.88570.26
ULC0.0990.0620.80220.087
indLRU0.001010.440
uniLRU0.0870.2357.2821.096
ULC0.130.25.960.2
indLRU0011.20
uniLRU0.9240.0610.271.15
ULC0.450.0410.1420.017
Chart2
80.560.924.623.5
26.719.50.71.4
23.87.87.84.7
93.581.116.918.4
1007.51.41.3
L1-L2 Demotion (uniLRU)
L2-L3 Demotion (uniLRU)
L1-L2 Demotion (ULC)
L2-L3 Demotion (ULC)
Demotion Rate (%)
Demotion Rates at Each Boundary for the Unified Caching Schemes
Sheet2
L1-L2 Demotion (uniLRU)L2-L3 Demotion (uniLRU)L1-L2 Demotion (ULC)L2-L3 Demotion (ULC)
RANDOM80.560.924.623.5
ZIPF26.719.50.71.4
HTTPD23.87.87.84.7
DEV193.581.116.918.4
TPCC11007.51.41.3
Chart4
8.78.48.48.48.48.48.48.48.43.1
2.22.44.25.711.58.48.58.117.810.1
22.24.25.411.28.28.38.215.913.5
1.80000000077.4
17.216.911.974.20.50000
1312.614.211.260.50000
13.112.612.311.13.21.22.61.800
0.900.210.11015.38.512.40.50
75.56.23.321.40.90.90.600
73.85.93.61.31.71.62.10.800
745.73.61.71.71.20.90.40.11.8
618.54.93.8422.2211.5
39.8149.67.25.64.53.62.820.9
17.414.412.410.89.37.96.65.33.91.9
109.99.79.79.49.28.88.786.7
9.79.89.89.79.49.29.18.68.26.5
91.12.90.60.1000000
88.53.92.20.1000000
83.55.320.50.710.70.30.20.6
86.55.71.40.50.20.10.30.100
52.79.97.63.21.800000
46.86.414.84.13.100000
45.26.211.25.52.92.70.21.30.10
35.58.61.36.111.56.51.93.800.1
ZIPF
CS
GLIMPSE
RANDOM
SPRITE
MULTI
Segment 1
Segment 2
Segment 3
Segment 4
Segment 5
Segment 6
Segment 7
Segment 8
Segment 9
Segment 10
Reference Rate (%)
Sheet3
Segment 1Segment 2Segment 3Segment 4Segment 5Segment 6Segment 7Segment 8Segment 9Segment 10
ND8.78.48.48.48.48.48.48.48.43.1
NLD2.22.44.25.711.58.48.58.117.810.1
IRR-R22.24.25.411.28.28.38.215.913.5
R1.80000000077.4
ND17.216.911.974.20.50000
NLD1312.614.211.260.50000
IRR-R13.112.612.311.13.21.22.61.800
R0.900.210.11015.38.512.40.50
ND75.56.23.321.40.90.90.600
NLD73.85.93.61.31.71.62.10.800
IRR-R745.73.61.71.71.20.90.40.11.8
R618.54.93.8422.2211.5
ND39.8149.67.25.64.53.62.820.9
NLD17.414.412.410.89.37.96.65.33.91.9
IRR-R109.99.79.79.49.28.88.786.7
R9.79.89.89.79.49.29.18.68.26.5
ND91.12.90.60.1000000
NLD88.53.92.20.1000000
IRR-R83.55.320.50.710.70.30.20.6
R86.55.71.40.50.20.10.30.100
ND52.79.97.63.21.800000
NLD46.86.414.84.13.100000
IRR-R45.26.211.25.52.92.70.21.30.10
R35.58.61.36.111.56.51.93.800.1
Performance on a Multi-client Structure httpd collected on a 7-node parallel web-server. openmail: collected on 6 HP 9000 K580 servers running HP OpenMail application. db2 collected on an 8-node IBM SP2 system running an IBM DB2 database. Block size: 8KBBlock transfer time between the clients and the server: 1msBlock transfer time between the server buffer cache and the disk: 10msCache size: 100MB each (except for workload tpcc1, which is 50MB)
The effect of cache pollution in MQ
Chart1
9.7459.71310.0957.326
9.6859.63210.0737.205
9.6859.4710.0317.077
9.6859.1459.9486.8
9.6858.829.8566.555
9.6858.4949.7096.199
9.6858.1579.4245.966
9.647.8128.8735.78
9.6247.4568.0055.662
9.6097.17.535.597
9.5016.7447.0635.43
9.1496.3946.3525.256
9.0326.0526.2585.084
8.6575.7136.1014.913
indLRU
MQ
uniLRU
ULC
Server Cache Size (MB)
Average Access Time (ms)
DB2
Sheet1
649.7459.71310.0957.326
1289.6859.63210.0737.205
2569.6859.4710.0317.077
5129.6859.1459.9486.8
7689.6858.829.8566.555
10249.6858.4949.7096.199
12809.6858.1579.4245.966
15369.647.8128.8735.78
17929.6247.4568.0055.662
20489.6097.17.535.597
23049.5016.7447.0635.43
25609.1496.3946.3525.256
28169.0326.0526.2585.084
30728.6575.7136.1014.913
Chart2
4.2484.1124.1313.582
4.1873.953.8783.344
4.0323.7333.563.061
3.9433.6633.442.959
3.7713.5073.272.791
3.523.3123.0522.59
3.43.2322.9582.506
3.2253.1162.832.384
3.0593.0182.7122.285
2.9012.92.5972.181
2.7592.772.4922.089
2.6282.6682.3892.005
2.4852.5482.281.911
2.3912.4642.2091.853
2.2842.3872.1261.783
2.1842.2882.0491.718
indLRU
MQ
uniLRU
ULC
Server Cache Size (MB)
Average Access Time (ms)
HTTPD
Sheet2
84.2484.1124.1313.582
164.1873.953.8783.344
27.344.0323.7333.563.061
323.9433.6633.442.959
403.7713.5073.272.791
50.783.523.3123.0522.59
563.43.2322.9582.506
643.2253.1162.832.384
723.0593.0182.7122.285
802.9012.92.5972.181
882.7592.772.4922.089
962.6282.6682.3892.005
105.472.4852.5482.281.911
1122.3912.4642.2091.853
1202.2842.3872.1261.783
1282.1842.2882.0491.718
Chart3
8.8388.5188.2865.495
8.8388.1376.5085.153
8.8387.7396.1444.925
8.8377.3635.8454.817
8.8066.9695.5654.755
8.4796.5495.3184.705
6.9526.1755.1354.653
6.335.9284.9654.589
6.1175.7474.8134.537
5.8825.5994.6864.491
5.665.4454.5744.446
5.4095.3264.4924.397
indLRU
MQ
uniLRU
ULC
Server Cache Size (MB)
Average Access Time (ms)
OPENMAIL
Sheet3
5128.8388.5188.2865.495
10248.8388.1376.5085.153
15368.8387.7396.1444.925
20488.8377.3635.8454.817
25608.8066.9695.5654.755
30728.4796.5495.3184.705
35846.9526.1755.1354.653
40966.335.9284.9654.589
46086.1175.7474.8134.537
51205.8825.5994.6864.491
56325.665.4454.5744.446
61445.4095.3264.4924.397
Chart1
9.7459.71310.0957.326
9.6859.63210.0737.205
9.6859.4710.0317.077
9.6859.1459.9486.8
9.6858.829.8566.555
9.6858.4949.7096.199
9.6858.1579.4245.966
9.647.8128.8735.78
9.6247.4568.0055.662
9.6097.17.535.597
9.5016.7447.0635.43
9.1496.3946.3525.256
9.0326.0526.2585.084
8.6575.7136.1014.913
indLRU
MQ
uniLRU
ULC
Server Cache Size (MB)
Average Access Time (ms)
DB2
Sheet1
649.7459.71310.0957.326
1289.6859.63210.0737.205
2569.6859.4710.0317.077
5129.6859.1459.9486.8
7689.6858.829.8566.555
10249.6858.4949.7096.199
12809.6858.1579.4245.966
15369.647.8128.8735.78
17929.6247.4568.0055.662
20489.6097.17.535.597
23049.5016.7447.0635.43
25609.1496.3946.3525.256
28169.0326.0526.2585.084
30728.6575.7136.1014.913
Chart2
4.2484.1124.1313.582
4.1873.953.8783.344
4.0323.7333.563.061
3.9433.6633.442.959
3.7713.5073.272.791
3.523.3123.0522.59
3.43.2322.9582.506
3.2253.1162.832.384
3.0593.0182.7122.285
2.9012.92.5972.181
2.7592.772.4922.089
2.6282.6682.3892.005
2.4852.5482.281.911
2.3912.4642.2091.853
2.2842.3872.1261.783
2.1842.2882.0491.718
indLRU
MQ
uniLRU
ULC
Server Cache Size (MB)
Average Access Time (ms)
HTTPD
Sheet2
84.2484.1124.1313.582
164.1873.953.8783.344
27.344.0323.7333.563.061
323.9433.6633.442.959
403.7713.5073.272.791
50.783.523.3123.0522.59
563.43.2322.9582.506
643.2253.1162.832.384
723.0593.0182.7122.285
802.9012.92.5972.181
882.7592.772.4922.089
962.6282.6682.3892.005
105.472.4852.5482.281.911
1122.3912.4642.2091.853
1202.2842.3872.1261.783
1282.1842.2882.0491.718
Chart3
8.8388.5188.2865.495
8.8388.1376.5085.153
8.8387.7396.1444.925
8.8377.3635.8454.817
8.8066.9695.5654.755
8.4796.5495.3184.705
6.9526.1755.1354.653
6.335.9284.9654.589
6.1175.7474.8134.537
5.8825.5994.6864.491
5.665.4454.5744.446
5.4095.3264.4924.397
indLRU
MQ
uniLRU
ULC
Server Cache Size (MB)
Average Access Time (ms)
OPENMAIL
Sheet3
5128.8388.5188.2865.495
10248.8388.1376.5085.153
15368.8387.7396.1444.925
20488.8377.3635.8454.817
25608.8066.9695.5654.755
30728.4796.5495.3184.705
35846.9526.1755.1354.653
40966.335.9284.9654.589
46086.1175.7474.8134.537
51205.8825.5994.6864.491
56325.665.4454.5744.446
61445.4095.3264.4924.397
Large demotion cost in uniLRU
Chart1
9.7459.71310.0957.326
9.6859.63210.0737.205
9.6859.4710.0317.077
9.6859.1459.9486.8
9.6858.829.8566.555
9.6858.4949.7096.199
9.6858.1579.4245.966
9.647.8128.8735.78
9.6247.4568.0055.662
9.6097.17.535.597
9.5016.7447.0635.43
9.1496.3946.3525.256
9.0326.0526.2585.084
8.6575.7136.1014.913
indLRU
MQ
uniLRU
ULC
Server Cache Size (MB)
Average Access Time (ms)
DB2
Sheet1
649.7459.71310.0957.326
1289.6859.63210.0737.205
2569.6859.4710.0317.077
5129.6859.1459.9486.8
7689.6858.829.8566.555
10249.6858.4949.7096.199
12809.6858.1579.4245.966
15369.647.8128.8735.78
17929.6247.4568.0055.662
20489.6097.17.535.597
23049.5016.7447.0635.43
25609.1496.3946.3525.256
28169.0326.0526.2585.084
30728.6575.7136.1014.913
Chart2
4.2484.1124.1313.582
4.1873.953.8783.344
4.0323.7333.563.061
3.9433.6633.442.959
3.7713.5073.272.791
3.523.3123.0522.59
3.43.2322.9582.506
3.2253.1162.832.384
3.0593.0182.7122.285
2.9012.92.5972.181
2.7592.772.4922.089
2.6282.6682.3892.005
2.4852.5482.281.911
2.3912.4642.2091.853
2.2842.3872.1261.783
2.1842.2882.0491.718
indLRU
MQ
uniLRU
ULC
Server Cache Size (MB)
Average Access Time (ms)
HTTPD
Sheet2
84.2484.1124.1313.582
164.1873.953.8783.344
27.344.0323.7333.563.061
323.9433.6633.442.959
403.7713.5073.272.791
50.783.523.3123.0522.59
563.43.2322.9582.506
643.2253.1162.832.384
723.0593.0182.7122.285
802.9012.92.5972.181
882.7592.772.4922.089
962.6282.6682.3892.005
105.472.4852.5482.281.911
1122.3912.4642.2091.853
1202.2842.3872.1261.783
1282.1842.2882.0491.718
Chart3
8.8388.5188.2865.495
8.8388.1376.5085.153
8.8387.7396.1444.925
8.8377.3635.8454.817
8.8066.9695.5654.755
8.4796.5495.3184.705
6.9526.1755.1354.653
6.335.9284.9654.589
6.1175.7474.8134.537
5.8825.5994.6864.491
5.665.4454.5744.446
5.4095.3264.4924.397
indLRU
MQ
uniLRU
ULC
Server Cache Size (MB)
Average Access Time (ms)
OPENMAIL
Sheet3
5128.8388.5188.2865.495
10248.8388.1376.5085.153
15368.8387.7396.1444.925
20488.8377.3635.8454.817
25608.8066.9695.5654.755
30728.4796.5495.3184.705
35846.9526.1755.1354.653
40966.335.9284.9654.589
46086.1175.7474.8134.537
51205.8825.5994.6864.491
56325.665.4454.5744.446
61445.4095.3264.4924.397
SummaryWe propose an effective way to quantify locality in multi-level caches;We design an efficient block placement / replacement protocol (ULC);ULC makes the layout of cached blocks in the hierarchy matches their locality; Experiments show that ULC significantly outperform exiting schemes.
In a client-server distributed system, caches are distributed along the clients, intermediate servers and built-inRAM memory in the disk-array to reduce data access time.
When a client requests a block, this block could be cached in any of the levels.
With CPU speed gets rapidly increased, and more and more data-intensive application emerge, caching becomesincreasingly more critical to sustained high system performance by reducing I/O accesses. However, simply throwing in lots of caches without smart management doesnt generate much benefits.
Most current systems independently apply local LRU replacement policy at each level of caches. There are several challenges on improving the performance of the hierarchical caches. The first is (1). Say, If we put all these caches together at the first level, we get an 80% hit rate. Previous studies show that it can only achieve a much lower hit rate for the distributed cache than that hit rate at most times, say only 40%. Considering the large difference between disk and cache latencies, an increase of the hit rate will be of great help to reduce average response time.
The hit times at different level of caches are different. High level caches have small hit times. So we want more hits are to the high level caches to reduce response time. E.g. there are two hit rate distributions here:. There are two reasons for the low hit ratios in distributed caches.
Reason I, The access locality has been weakened in the low level caches.Low level caches hold the misses from their upper level buffer caches. The requests with strong locality have been filtered by the upper level buffer caches Actually only the first level buffer cache is exposed with the original locality, and has the highest potential to exploit it. This makes the low level caches seriously under-utilized.
The Multi-Queue is a replacement algorithm dedicated for low level buffer caches. Realizing that the access locality in the low level caches is poor, MQ rely on frequency to make its replacement decision.MQ uses multiple LRU queues to distinguish the blocks with different frequency. Once a block is accessed, it is promoted to a higher queue. Periodically, blocks in each queue are checked and low frequency blocks are demoted to lower queues.It also uses an FIFO queue to remember access frequencies of recently evicted blocks for some period of time.. Inheriting the weakness of frequency-based algorithm not responsive to access pattern changesContaining workload sensitive parameters;Cannot fully exploit the locality knowledge inherent in applications, becauseOnly clients can see the original access stream.The shortcoming of this approach motivates us to think of conducting locality analysis only at clients,And the analysis results are used to guide the low level cache management. Another reason for the low hit ratios in distributed caches is the undiscerning redundancy among Levels of Buffer Caches.
Existing independent cache replacements can store the same data blocks in multiple level of caches.In the worst case, The aggregate cache behaves as if it was only as the largest of caches in the hierarchy.
The figure shows the results obtained by simulating actions of the client and server buffer caches when both caches useLRU replacement on a workload from Sprite file system. The sizes of both caches are 16MB. Especially when a large data file is accessed sequentially, the contents of both caches are almost identical.
The existing approach to remove the redundancy is to Extending Existing Replacement into an Unified Replacement .Here is an example scheme called unified LRU. In the unified LRU, the actions on two LRU stack are coordinated through demotions.When an LRU block is evicted from the client, it is demoted into the MRU position of server. In this way, it works like a single LRU stack eliminating redundancy and makes the second level cache contribute to the hit rates like an extension of the first level cache.
Though unified LRU can increase the hit ratio of multiple level of caches, it has twoProblems to be solved: (1) It cannot make sure high level caches have more hits; (2) It could Introduce a large number of demotions.
In this example access stream, all the hits are in this position of L2 cache and L1 cache has no hits. Furthermore, each block access accompanies with a demotion.This kind of access pattern are common in database transaction processing workloads. where repeated sequential accesses on some set of data make unified LRU undesirable.
In this case, though the whole cache system has a high hit ratio with the unifiedLRU algorithm, it could produce unnecessarily high block access response time.
The essential cause for the problem is that the locality distribution does not match the level distribution. Ideally, blocks with weak locality stay in the low level cachesIn general, we take our approach with these considerations:.
The caching block layout in the hierarchy dynamically matches the current locality strength of accessed blocks. The stronger the locality of a block is, the closer the block is cached to the client.Saying a block with strong locality, we mean it is highly possible to be accessed soon and itcontributes more to hit ratios if being cache than the one with weak locality. Each replacement algorithm has its criterion to quantify locality strength
The time from current reference to its next reference, called next locality distance (NLD) The time from last reference to its current reference, called last locality distance (LLD)To include the most recent references after last reference to the block, we use the larger of LLD and R to simulate NLD, called LLD-R.
Benefits of NLD and LLD: they will not change until the next reference of the blockOur previous work shows it can make accurate and responsive quantification
Our study shows that the position of a block in the IRR-R ordered list is an accurate indicator for us to place or cache the block at a level corresponding to its list position. Effectively using the indicator, we propose a multi-level buffer caching protocol, called (ULC) protocol to effectively exploit hierarchical locality.
as the knowledge of the cache size of each level.
To make the placement We use IRR-R to quantify locality. When IRR-R can be used to predict the next access position of a block in the LRU stack.We will have more powerful means to manage the distributed caches.This figure illustrates the IRR-R Based block Placement and Replacement. When a block is accessed, its IRR-R is the recency at which it is accessed;The LRU position at which a block is accessed determines its placement In this example, if block 2 is accessed, it will placement at level 1;If it is block 7, it is placed in level 2.
Compare with UNI-LRU
The current LRU position at which a block is accessed determines its replacement. If a block is not accessed for some time, and becomes the leased recent block among the blocks in the level, it is replaced from the level and demoted into its lower level.Clients maintain this data structure for locality analysis.
The unified LRU stack, called uniLRUStack. Keep the information of the recently accessed blocks. Whenever there is a block accessed, we place it at the topof the stack. Please note that this does not mean the block is to be cached in L1 cache.These blocks could be cached at any level of the hierarchy, even not cached based on the recency at which they are accessed.
The blocks actually cached in L1 cache are shown in red color,Those in L2 cache are in blue yellow, those in L3 are in light blue color.Blocks in deep blue are not cached. The blocks in the same level can be viewed to be organized into a separate LRU stack.We do not explicitly maintain these individual stacks --- they are virtual stacks.
For each level of buffer cache there is a yardstick block, which is the blockwith the maximal recency among the blocks cached in the level, or the bottom blockin each virtual stack.
Each block have two statuses: recency status and level status. Recency Status is determined by its current recencye.g. the recency status of the blocks above the Y1 is R1, the recency status of the blocks between Y1 and Y2 is R2.
Level Status is determined by LAD, or more precisely the recency status at the time of its reference.e.g. when block 5 is accessed when its recency status is R1, its level status is L1.
Blocks are stored in the level specified by its level status. E.g. block 5 with level status R1 are store in L1.
When there are not enough space in a level to store a new block, we need to replace a cached block in the level.. The yardstick block is the one for replacement at the corresponding levelThe yardstick block, or the bottom block in the corresponding virtual LRU stack, is the one for replacement at the corresponding level.
All the locality analysis is conducted in the client.The client uses these two request messages to guide the block management in the low level cache.
Retrieve (b, i, j) is an extension of original block request. retrieve block b from level Li, and cache it at level Lj when it passes level Lj on its route to level L1. So the accessed block is not necessarily cached in L1 cache.If not cached in L1, the block will soon be replaced from L1
Demote (b, i, j) (i < j): demote block b from level Li into level Lj.
Here is an example to illustrate the operations.
We are going to access block 11, lets look at its status: L3, R2.
When accessed, L3 L2, because of R2So we want to retrieve it from level 3 and cache itat level 2. Its recency status changes to R1 Because the size of level 2 cache is 4, we LRU block in level 2 to level3, accordingly,we adjust yardstick position. This is an example showing how ULC works for multiple clients. When there are multiple clients sharing one server, the buffers in the server are no longer solely used by a single client. To facilitate the sharing and the allocationof server cache to the clients, we set a global LRU Stack in the server.
Originally server stack gLRU holds all the L2 blocks from clients 1 and 2. Then block 9 is accessed in client 1. Because block 9 is R2 (between yardstick Y1and Y2 in its uniLRUStack), it turns into L2 block and needs to becached in the server. Because the server cache is full, the bottom block of gLRU, block 14, is replaced,which will be notified to its owner, client 2.
After the server buffers re-allocation (see figure (b)), the size of server cachefor client 1 is increased by 1 and that for client 2 is decreased by 1. In this way, the clients and the server cooperate to make the server buffers be dynamically allocated to the clients in accordance to their working set sizes.for the performance of the entire system.
We use these workload traces to evaluate the ULC replacement protocol. Lets see the performance for a single client 3-level structure.This is to simulate the case for the cache distributed on the client, server and disk array.We compare the performance of independent LRU, unified LRU, and ULC.
The ULC has the best hit rate. And the independent LRU is the worst, because it keeps redundant blocks across the levels. E.g. in this RANDOM trace, the 2th and 3rd level have little contribution to the hit rates because of the redundancy. ULC has an access time aware it rate distribution. More hits appear on the upper level, which are fast. e.g. in TPCC1, the 1st level hits in ULC are much than those in uniLRU.
This shows the demotion rates for uniLRU and ULC. These two bars are for uniLRU, These two bars are for ULCThe demotion rates for ULC are much lower than uniLRU,This is because uniLRU has little power to predict at which level a block will be accessed.e.g. in TPCC1, most hits are on the 2nd level, instead of caching the blocks in the 2nd level like what happens in ULC,uniLRU brings all the accessed blocks into 1st level and then demote them into 2rd level.
ULC provides the lowest average access time.Three factors contribute to this Small demotion cost high overall hit rate; high high level hit rateThis is for workload HTTPD, ULC shows the lowest average access time.Generally indLRU gives the worst access time. After server cache size is beyond 1G, uniLRU significant increases its overall hit ratios, thus access time is reduced.