Upload
monita
View
55
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Shared Last-Level TLBs for Chip Multiprocessors. Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011. Presented by: Apostolos Kotsiolis CS 7123 – Research Seminar. Translation Lookaside Buffer. Contribution. SLL TLB design explored for the first time - PowerPoint PPT Presentation
Citation preview
Shared Last-Level TLBs for Chip
MultiprocessorsAbhishek Bhattacharjee
Daniel Lustig Margaret Martonosi
HPCA 2011
Presented by: Apostolos KotsiolisCS 7123 – Research Seminar
Translation Lookaside Buffer
ContributionSLL TLB design explored for the
first timeAnalyze SLL TLB benefits for
parallel programsAnalyze multi-programmed
fashion workloads consisting of sequential applications
Previous and Related workPrivate Multilevel TLB Hierarchies
◦Intel i7, AMD K7-K8-K10, SPARC64-III◦No Sharing between cores◦Waste of resources
Inter-Core Cooperative Prefetching◦Two types of predictable misses:◦Inter-Core Shared (ICS)
Leader-Follower Prefetching◦Inter-Core Predictable Stride (ICPS)
Distance-Based Cross-Core Prefetching
Shared Last-Level TLBsExploit inter-core sharing in
parallel programsFlexible regarding where entries
can be placedBoth parallel and sequential
workloads are benefitedGreater Hit rateCPU Performance boosted
Shared Last-Level TLBs
Shared Last-Level TLBs with simple Stride Prefetching
Methodology
◦Parallel applications
◦Different Sequential application on each core
Two distinct evaluation sets
MethodologyBenchmarks
SLL TLBs: Parallel Workload ResultsSLL TLBs versus Private L2 TLBs
SLL TLBs: Parallel Workload ResultsSLL TLBs versus ICC Prefetching
SLL TLBs: Parallel Workload ResultsSLL TLBs versus ICC Prefetching
SLL TLBs: Parallel Workload Results
SLL TLBs with Simple Stride Prefetching
SLL TLBs: Parallel Workload ResultsSLL TLBs at Higher Core Counts
SLL TLBs: Parallel Workload ResultsPerformance Analysis
SLL TLBs: Multiprogrammed Workload ResultsMultiprogrammed Workloads with
One Application Pinned per Core
SLL TLBs: Multiprogrammed Workload ResultsPerformance Analysis
Conclusion-Benefits:On Parallel Workloads:
◦Elimination of 7-79% of L1 TLBs misses exploiting parallel program inter-core sharing
◦Outperform conventional per-core private L2 TLBs by average of 27%
◦Improve CPI up to 0.25On multiprogrammed sequential
workloads:◦Improve over private L2 TLBs by
average of 21%◦Improve CPI up to 0.4
Thank You!Questions?
?