19

Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS 7123 – Research Seminar

Shared Last-Level TLBs for Chip Multiprocessors

Download PPTX Report

Upload
monita
View
55
Download
0

Tags:

Embed Size (px)

DESCRIPTION

Shared Last-Level TLBs for Chip Multiprocessors. Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011. Presented by: Apostolos Kotsiolis CS 7123 – Research Seminar. Translation Lookaside Buffer. Contribution. SLL TLB design explored for the first time - PowerPoint PPT Presentation

Citation preview

Page 1: Shared Last-Level TLBs for Chip Multiprocessors

Shared Last-Level TLBs for Chip

MultiprocessorsAbhishek Bhattacharjee

Daniel Lustig Margaret Martonosi

HPCA 2011

Presented by: Apostolos KotsiolisCS 7123 – Research Seminar

Page 2: Shared Last-Level TLBs for Chip Multiprocessors

Translation Lookaside Buffer

Page 3: Shared Last-Level TLBs for Chip Multiprocessors

ContributionSLL TLB design explored for the

first timeAnalyze SLL TLB benefits for

parallel programsAnalyze multi-programmed

fashion workloads consisting of sequential applications

Page 4: Shared Last-Level TLBs for Chip Multiprocessors

Previous and Related workPrivate Multilevel TLB Hierarchies

◦Intel i7, AMD K7-K8-K10, SPARC64-III◦No Sharing between cores◦Waste of resources

Inter-Core Cooperative Prefetching◦Two types of predictable misses:◦Inter-Core Shared (ICS)

Leader-Follower Prefetching◦Inter-Core Predictable Stride (ICPS)

Distance-Based Cross-Core Prefetching

Page 5: Shared Last-Level TLBs for Chip Multiprocessors

Shared Last-Level TLBsExploit inter-core sharing in

parallel programsFlexible regarding where entries

can be placedBoth parallel and sequential

workloads are benefitedGreater Hit rateCPU Performance boosted

Page 6: Shared Last-Level TLBs for Chip Multiprocessors

Shared Last-Level TLBs

Page 7: Shared Last-Level TLBs for Chip Multiprocessors

Shared Last-Level TLBs with simple Stride Prefetching

Page 8: Shared Last-Level TLBs for Chip Multiprocessors

Methodology

◦Parallel applications

◦Different Sequential application on each core

Two distinct evaluation sets

Page 9: Shared Last-Level TLBs for Chip Multiprocessors

MethodologyBenchmarks

Page 10: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Parallel Workload ResultsSLL TLBs versus Private L2 TLBs

Page 11: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Parallel Workload ResultsSLL TLBs versus ICC Prefetching

Page 12: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Parallel Workload ResultsSLL TLBs versus ICC Prefetching

Page 13: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Parallel Workload Results

SLL TLBs with Simple Stride Prefetching

Page 14: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Parallel Workload ResultsSLL TLBs at Higher Core Counts

Page 15: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Parallel Workload ResultsPerformance Analysis

Page 16: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Multiprogrammed Workload ResultsMultiprogrammed Workloads with

One Application Pinned per Core

Page 17: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Multiprogrammed Workload ResultsPerformance Analysis

Page 18: Shared Last-Level TLBs for Chip Multiprocessors

Conclusion-Benefits:On Parallel Workloads:

◦Elimination of 7-79% of L1 TLBs misses exploiting parallel program inter-core sharing

◦Outperform conventional per-core private L2 TLBs by average of 27%

◦Improve CPI up to 0.25On multiprogrammed sequential

workloads:◦Improve over private L2 TLBs by

average of 21%◦Improve CPI up to 0.4

Page 19: Shared Last-Level TLBs for Chip Multiprocessors

Thank You!Questions?

?

“ Nahalal: Cache Organization for Chip Multiprocessors ” New LSU Policy

“ Nahalal: Cache Organization for Chip Multiprocessors ” New LSU Policy

Documents

Coherence Ordering For Ring- Based Chip Multiprocessors

Coherence Ordering For Ring- Based Chip Multiprocessors

Education

Utilizing Shared Data in Chip Multiprocessors with the Nahalal Architecture

Utilizing Shared Data in Chip Multiprocessors with the Nahalal Architecture

Documents

Physical Planning for the Architectural Exploration of Large-Scale Chip Multiprocessors

Physical Planning for the Architectural Exploration of Large-Scale Chip Multiprocessors

Documents

Lect. 10: Chip-Multiprocessors (CMP) · CS4/MSc Parallel Architectures - 2012-2013 Lect. 10: Chip-Multiprocessors (CMP) Main driving forces: – Complexity of design and verification

Lect. 10: Chip-Multiprocessors (CMP) · CS4/MSc Parallel Architectures - 2012-2013 Lect. 10: Chip-Multiprocessors (CMP) Main driving forces: – Complexity of design and verification

Documents

Chip Multiprocessors, Introduction and Challenges Lecture 8 February 6, 2013 Mohammad Hammoud

Chip Multiprocessors, Introduction and Challenges Lecture 8 February 6, 2013 Mohammad Hammoud

Documents

Modeling Multithreaded Query Execution on Chip Multiprocessors · tithreaded query execution on chip multiprocessors. Multicore means shared memory. Modern CPUs inte-grate multiple

Modeling Multithreaded Query Execution on Chip Multiprocessors · tithreaded query execution on chip multiprocessors. Multicore means shared memory. Modern CPUs inte-grate multiple

Documents

Dynamic Power-Performance Adaptation of Parallel Computation on Chip Multiprocessors

Dynamic Power-Performance Adaptation of Parallel Computation on Chip Multiprocessors

Documents

Cooperative Caching for Chip Multiprocessors by Jichuan Chang

Cooperative Caching for Chip Multiprocessors by Jichuan Chang

Documents

Direct Communicationand Synchronization Mechanismsin Chip Multiprocessors

Direct Communicationand Synchronization Mechanismsin Chip Multiprocessors

Documents

Chip Multiprocessors COMP35112 Lecture 1 - Introduction

Chip Multiprocessors COMP35112 Lecture 1 - Introduction

Documents

Cache Coherence Protocols for Chip Multiprocessors - Ijohnmc/comp522/lecture-notes/COMP... · Cache Coherence Protocols for Chip Multiprocessors - I COMP 522 Lecture 5 6 September

Cache Coherence Protocols for Chip Multiprocessors - Ijohnmc/comp522/lecture-notes/COMP... · Cache Coherence Protocols for Chip Multiprocessors - I COMP 522 Lecture 5 6 September

Documents

Chip-Multiprocessors & You

Chip-Multiprocessors & You

Documents

Thread to Core Assignment in SMT On-Chip Multiprocessors

Thread to Core Assignment in SMT On-Chip Multiprocessors

Documents

Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors

Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors

Documents

Investigation of DVFS Based Dynamic Reliability Management for Chip Multiprocessors · · 2018-03-02Investigation of DVFS Based Dynamic Reliability Management for Chip Multiprocessors

Investigation of DVFS Based Dynamic Reliability Management for Chip Multiprocessors · · 2018-03-02Investigation of DVFS Based Dynamic Reliability Management for Chip Multiprocessors

Documents

A Case for Integrated Processor-Cache Partitioning in Chip Multiprocessors

A Case for Integrated Processor-Cache Partitioning in Chip Multiprocessors

Documents

Core-Selectability in Chip-Multiprocessors

Core-Selectability in Chip-Multiprocessors

Documents

Microvisor : A Runtime Architecture for Thermal Management in Chip Multiprocessors

Microvisor : A Runtime Architecture for Thermal Management in Chip Multiprocessors

Documents

Single-Chip Multiprocessors: Redefining the Microarchitecture of Multiprocessors Guri Sohi University of Wisconsin

Single-Chip Multiprocessors: Redefining the Microarchitecture of Multiprocessors Guri Sohi University of Wisconsin

Documents

SharedLast-LevelTLBsforChipMultiprocessors...As chip multiprocessors (CMPs) become dominant, it is crucial to examine TLBs in this context. This paper is the ﬁrst to propose and

SharedLast-LevelTLBsforChipMultiprocessors...As chip multiprocessors (CMPs) become dominant, it is crucial to examine TLBs in this context. This paper is the ﬁrst to propose and

Documents

ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors

ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors

Documents

Core Architecture Optimization for Heterogeneous Chip Multiprocessors

Core Architecture Optimization for Heterogeneous Chip Multiprocessors

Documents

Nano-Photonic Networks-on-Chip for Future Chip Multiprocessors

Nano-Photonic Networks-on-Chip for Future Chip Multiprocessors

Documents

CHIP MULTIPROCESSORS FOR SERVER WORKLOADS NIKOLAOS HARDAVELLAS

CHIP MULTIPROCESSORS FOR SERVER WORKLOADS NIKOLAOS HARDAVELLAS

Documents

Single-Chip Multiprocessors: The Next Wave of … Sohi.pdf · Single-Chip Multiprocessors: The Next Wave of Computer Architecture Innovation ... Totally rethink uniprocessor microarchitecture

Single-Chip Multiprocessors: The Next Wave of … Sohi.pdf · Single-Chip Multiprocessors: The Next Wave of Computer Architecture Innovation ... Totally rethink uniprocessor microarchitecture

Documents

Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Documents

Cooperative Caching for Chip Multiprocessors

Cooperative Caching for Chip Multiprocessors

Documents

Database servers on chip multiprocessors: limitations and ... · Database servers on chip multiprocessors: limitations and opportunities N. Hardavellas I. Pandis R. Johnson N. Mancheril

Database servers on chip multiprocessors: limitations and ... · Database servers on chip multiprocessors: limitations and opportunities N. Hardavellas I. Pandis R. Johnson N. Mancheril

Documents

Optimizing shared caches in chip multiprocessors

Optimizing shared caches in chip multiprocessors

Technology