24
1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University of Toronto, Canada

1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

1

PATH: Page Access Tracking Hardware to

Improve Memory Management

Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown

University of Toronto, Canada

Page 2: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

2

Page Access Tracking Challenge Storage Management Research

Many sophisticated algorithms Most require accurate knowledge about memory access trace Adopted mostly for file systems or databases Not straightforward for virtual memory

Problem: Limited Page Access Tracking Hard to measure either Reuse Distance or Temporal

Locality

Conventional Access Tracking Mechanisms Monitoring page faults

Most page accesses are missed.

Scanning Page Table bits High scanning overhead => low scanning frequency

Page 3: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

3

Page Access Tracking Challenge(cont’d) Access Tracking with Performance Counters

Statistical Data Sampling: Favours only hot pages Hard to track reuse distance or temporal locality

Recording TLB misses- High overhead

TLB’s are small (TLB miss is very frequent) TLB miss handling is performance-critical

Hardware Approach [Zhou et al. ASPLOS’04]+ Effective for its purpose (but inflexible)- Impractical hardware resource requirements

~ 1 MB of hardware buffer per 1GB of physical memory!

Software Approach [Yang et al. OSDI’06] Dividing pages into active and inactive sets Page-protecting members of the inactive set

- Overhead can still be too high

Page 4: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

4

Page Access Tracking in Software

Performance of adaptive page replacement for FFT vs.

Runtime overhead of page access tracking in software

10% overhead even with a

large active set and poor

performance

90% overhead to get acceptable performance

Page 5: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

5

Page Access Tracking Hardware (PATH)

Advantages Extra hardware resources required are small (around 10KB) Off the common path Scalable (does not grow with physical memory)

TLBCPU Core

VADDRLOOKU

PMISS

Page Tables

VADDRPage Access Buffer

OverflowInterrupt

MISSPage Access Log

Page 6: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

6

Information Provided by PATH

Raw Form Abstraction: Precise LRU Stack Abstraction: Miss Rate Curve (MRC)

TLBCPU Core

VADDRLOOKU

PMISS

Page Tables

VADDRPage Access Buffer

OverflowInterrupt

MISSPage Access Log

Page 7: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

7

Basic Abstraction: LRU Stack

Accessed and updated for each entry on the Page Access Log

Implementation: Lookup:

Page Table-like Structure O(1) lookup time

Update Doubly linked list A few pointers are updated for each page access

Most Recently Accessed

Least Recently Accessed

Page 8: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

8

Basic Abstraction: Miss Rate Curve (MRC)

Memory Size

Cap

acit

y M

isse

s

Basic Info: The number of misses for a given memory size in period of time.

Basic Use: Estimating the “memory needs” of an application.

Page 9: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

9

Computing MRC Online Mattson’s Stack Algorithm

For LRU: Memory Sizes < LRU Distance: miss Memory Sizes >= LRU Distance: hit

LRU Distance

MRU Distance

Page Access

Most Recently Accessed

Least Recently Accessed

Page 10: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

10

Runtime Overhead Tradeoff

The larger the Page Access Buffer (active set) The more page accesses are filtered+ The less run-time overhead

- The less accurate page access trace

TLBCPU Core

VADDRLOOKU

PMISS

Page Tables

VADDRPage Access Buffer

OverflowInterrupt

MISSPage Access Log

Page 11: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

11

Runtime Overhead, Example: FFT

020406080

100120140160180200

PATH Software Approach

Ru

nti

me

Ove

rhea

d (

%)

128

512

2K

4K

8K

16K

32K

Active Set

Entries

Page 12: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

12

Runtime Overhead, Example: LU-non.

0102030405060708090

100

PATH Software Approach

Ru

nti

me

Ove

rhea

d (

%)

128

512

2K

4K

8K

16K

32K

Active Set

Entries

Page 13: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

13

Runtime Overhead Summary

Overall, a 2K Entry Page Access Buffer seems to be the best point in the tradeoff between performance and runtime overhead.

PATH’s overhead is less than 6% across a wide variety of applications.

PATH’s overhead is negligible in most cases.

Page 14: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

14

Case 1: Adaptive Page Replacement Region-based Page Replacement

Use different replacement policies for different regions in the virtual address space

Rationale: each region is likely to contain a data structure with a fairly stable access pattern

Low Inter-Reference Set (LIRS) Handles sequential and looping patterns Requires tracking page accesses Originally developed for file system caching Easily enabled by the PATH-generated information

Page 15: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

15

Region-based Replacement Using MRC for comparison:

Memory Size

Mis

s R

ate

LRU

MRU

Page 16: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

16

Region-based Replacement (cont’d) Dividing Memory among Regions

Minimize total miss rate by giving memory to the regions that have more “benefit-per-page”.

Region 1

Memory Size

Cap

acity

Mis

ses

Region 2

Memory Size

Cap

acity

Mis

ses

Page 17: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

17

Simulation Results

LU-contiguous (SPLASH2)

Page 18: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

18

Simulation Results

BT (NAS Benchmark)

Page 19: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

19

Case 2: Prefetching Spatial Locality-based

Prefetch pages spatially-adjacent to the faulted page. Advantages

Simple and easy to implement Effective for many cases

Major drawback Oblivious to non-spatial access patterns

Temporal Locality-based Prefetch pages that are regularly accessed together. Use PATH to track temporal locality of pages.

Page 20: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

20

Temporal Locality-based Prefetching

Page Proximity Graph (PPG) Each page is a node There exists an edge from p to q if q is regularly accessed shortly after p

(temporal locality)

PPG Update: Add a page q to p’s proximity set if q appears in the LRU stack in close

proximity to p repeatedly .

Basic prefetching scheme: Breadth-First traversal starting from the faulted page.

Page 21: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

21

Prefetching

LU non-contiguous (SPLASH2)

Page 22: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

22

Conclusions Page Access Tracking Hardware

Small (10KBytes in size) Low-overhead Generic

Cases Studied Adaptive Page Replacement Process Memory Allocation (See Paper) Prefetching

Significant performance improvement can be achieved by tracking page accesses.

Page 23: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

23

Future Directions Other case studies

NUMA page placement Super-page management

Per-thread page access tracking Augmenting page accesses with thread info

Multiprocessor issues Combining traces collected on multiple CPUs

Page 24: 1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University

24

Questions