for Cache Replacement An Imitation Learning Approach

Evan Z. Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, Junwhan Ahn

An Imitation Learning Approach for Cache Replacement

The Need for Faster Compute

(https://openai.com/blog/ai-and-compute/)

Small cache improvements can make large differences! (Beckman, 2019)● E.g., 1% cache hit rate improvement → 35%

decrease in latency (Cidon, et. al., 2016)

Caches are everywhere:● CPU chips● Operating Systems● Databases● Web applications

Our goal: Faster applications via better cache replacement policies

TL;DR:

I. We approximate the optimal cache replacement policy by (implicitly) predicting the future

II. Caching is an attractive benchmark for the general reinforcement learning / imitation learning communities

MissHit (100x faster)

Cache Replacement

BA DCache

Accesses

Goal: Evict the cache lines to maximize cache hits

Cache Replacement

Accesses

BA B DA

HitMiss

Mistake

Cache Replacement

Accesses

HitMiss

Optimal decision

Cache Replacement

Accesses

HitMiss

Reuse distance dt(line): number of accesses from access t until the line is reusedd0(A) = 1, d0(B) > 2, d0(C) =

Optimal Policy (Belady’s): Evict the line with the greatest reuse distance (Belady, 1966)

Belady’s Requires Future Information

Reuse distance dt(line): number of accesses from access t until the line is reused

Problem: Computing reuse distance requires knowing the future

So in practice, we use heuristics, e.g.:● Least-recently used (LRU)● Most-recently used (MRU)

… but these perform poorly on complex access patterns

Leveraging Belady’s

Idea: approximate Belady’s from past accesses

Past accesses Current access

Future accesses

. . .. . .

Learned Model Belady’s

Predicted decision Optimal decisionTraining

Prior Work

Past accesses

Current access

Currentcache state

Current line cache friendly or averse?

Evict line XTrained on

Belady’s

Traditional Algorithm

Hawkeye / Glider

Current state-of-the-art (Shi et. al., ‘19, Jain et. al., ‘18)

Prior Work

+ binary classification is relatively easy to learn

- traditional algorithm can’t express optimal policy

Past accesses

Current access

Currentcache state

Belady’s

Hawkeye / Glider

Our proposal

Our Approach

Past accesses

Current access

Currentcache state

Evict line X

Our contribution:Directly approximate Belady’s

via imitation learning

Trained on Belady’s

Past accesses

Current access

Currentcache state

Belady’s

Hawkeye / Glider

Cache Replacement Markov Decision Process

MissHitMiss

BA DCache

Accesses

Similar to Wang, et. al., 2019

Past accesses Current access MissHitMiss

BA DCache

Accesses

Current cache contents

MissHitMiss

ACache

Accesses

BA DCache

Accesses

MissHitMiss

BA C B DA

Leveraging the Optimal Policy

Typical imitation learning setting(Pomerlau, 1991, Ross, et. al., 2011, Kim, et. al., 2013)

Learned policy

optimal action

Learned policy Approximate optimal policy

optimize, e.g.,

Observation: Not all errors are equally bad● Learning from optimal policy yields

greater training signal

Concretely: minimize a ranking loss

Reuse distance

Reuse Distance as an Auxiliary Task

Observation: predicting reuse distance is correlated with cache replacement● Cast this as an auxiliary task (Jaderberg, et. al., 2016)

State st

State embedding

Policy

Results

LRU cache-hit rate

Optimal cache-hit rate

~19% cache-hit rate increase over Glider (Shi, et. al., 2019) on memory-intensive SPEC2006 applications (Jaleel, et. al., 2009)

~64% cache-hit rate increase over LRU on Google Web Search

This work: Establish a proof-of-concept

A Note on Practicality

12 ...Address: 0x C5 A1

Byte 1 Byte 2 Byte 3

Linear Layer

address embedding

Per-byte address embedding● Reduce embedding size from 100MB to <10KB● ~6% cache-hit rate increase on SPEC2006 vs.

Glider● ~59% cache-hit rate increase on Google Web

Search vs. LRU

Per-byte address embedding● Reduce embedding size from 100MB to <10KB● ~6% cache-hit rate increase on SPEC2006 vs.

Glider● ~59% cache-hit rate increase on Google Web

Search vs. LRU

This work: Establish a proof-of-concept

A Note on Practicality

12 ...Address: 0x C5 A1

Byte 1 Byte 2 Byte 3

Linear Layer

address embedding

Future work: Production ready learned policies● Smaller models via distillation (Hinton, et. al., 2015), pruning (Janowsky, 1989,

Han, et. al., 2015, Sze, et. al., 2017), or quantization● Target domains with longer latency and larger caches (e.g., software

caches)

A New Imitation / Reinforcement Learning Benchmark

+ plentiful data- delayed real-world utility

- limited / expensive data+ immediate real-world impact

+ plentiful data+ immediate real-world impact

EvictBellemare, et. al., 2012,Silver, et. al., 2017, OpenAI, 2019, Vinyals, et. al., 2019

Levine, et. al., 2016, Lillicrap, et. al., 2015

Open-source cache replacement Gym environment coming soon!

Takeaways

● A new state-of-the-art approach for cache replacement by imitating the oracle policy○ Future work: making this production ready

● A new benchmark for imitation learning / reinforcement learning research

for Cache Replacement An Imitation Learning Approach

Documents

An Imitation Learning Approach for Cache Replacementproceedings.mlr.press/v119/liu20f/liu20f.pdf · An Imitation Learning Approach for Cache Replacement lines that are cache-friendly

An Imitation Learning Approach for Cache Replacement · added to the cache (i.e., due to a cache miss), an existing cache line must be evicted from the cache to make space for the

Simulation and AAnalysis of Cache Replacement Algorithms

CPS110: Page replacement Landon Cox. Replacement Think of physical memory as a cache What happens on a cache miss? Page fault Must decide what

Study of Different Cache Line Replacement Algorithms in Embedded

Adapting Cache Partitioning Algorithms to Pseudo-LRU Replacement

Designing a Cost-Effective Cache Replacement Policy using

Cache Replacement Policy Using Map-based Adaptive Insertion

Cache Replacement Using RRIP - IsCA 2010

Simulation and AAnalysis of Cache Replacement …twang1/studentProjects/Cache...Simulation and Analysis of Cache Replacement Algorithms Page 8 of 41 cache performance measurements

TTL Approximations of the Cache Replacement Algorithms LRU

A Case for MLP-Aware Cache Replacement

DISTRIBUTED MULTIMEDIA PROXY CACHE REPLACEMENT ALGORITHMS A Thesis

A 32kB Secure Cache Memory with Dynamic Replacement

Performance Evaluation of an Optimal Cache Replacement Policy

Cache Memory. Outline zMemory Hierarchy zDirect-Mapped Cache zWrite-Through, Write-Back zCache replacement policy zExamples

Adapting cache partitioning algorithms to pseudo-LRU replacement

A Study on Cache Replacement Policies in Level 2 Cache for ...ethesis.nitrkl.ac.in/5526/1/212CS1087-14.pdf · A Study on Cache Replacement Policies in Level 2 Cache for Multicore

Cache Replacement Policies

WLRU CPU Cache Replacement Algorithm