Upload
zeus-sherman
View
32
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Lecture 25: Wrap-Up. Mid-term-II stats: High 91 Mean 73.12 Qs 1-3: half the class got 25/25 Qs 4: only one student got 25/25; almost no one mentioned that we’ll need a mechanism to determine exclusivity Qs 5: highest was 22/30; very few mentioned that allowing - PowerPoint PPT Presentation
Citation preview
1
Lecture 25: Wrap-Up
• Mid-term-II stats: High 91 Mean 73.12
• Qs 1-3: half the class got 25/25
• Qs 4: only one student got 25/25; almost no one mentioned that we’ll need a mechanism to determine exclusivity
• Qs 5: highest was 22/30; very few mentioned that allowing blocks to move would complicate search
2
Example Solutions
3
Example Solutions
4
Example Solutions
5
Example Solutions
6
Example Solutions
7
Example Solutions
8
CPU 2
L1D L1I CP
U 4
L1
DL
1I
CPU 6
L1DL1I
CP
U 1
L1
DL
1I
CPU 3
L1D L1I
CP
U 5
L1
DL
1I
CPU 7
L1DL1I
CP
U 0
L1
DL
1I
9
Tetris?!
10
Non-Uniform Cache Access (NUCA)
• Many open problems in NUCA and D-NUCA How should search happen? Allocation/replacement/migration policies Managing bandwidth/latency on the network Prefetch mechanisms Selective replication of blocks Efficient write-throughs Power/performance trade-offs
• P.S. We have simulators, etc., to help model such caches in case anyone is interested
11
Shameless Plug
• CS 7810: Advanced Architecture
• Lectures based on seminal (and still relevant) papers
• Not much work, apart from class project (in teams)
• Class project can involve as little as 1 week’s worth of concentrated effort…
• … or, enough to get a paper out of it you WILL work on novel problems lots of help from me/other students with the simulator
12
3-D
• Imagine a similar problem in 3D
C P C P
CP CP
C P C P
CP CP
C P C P
CP CP
13
3-D
• Imagine a similar problem in 3D
C P C P
CP CP
C P C P
CP CP
C P C P
CP CP
Must schedule threads to manage temperature
14
Single Thread Performance
• To improve single-thread performance, can even schedule a single thread’s instructions across cores – large window of in-flight instructions to mine high ILP – requires high levels of speculation (power-hungry!) – any solutions?
C P C P
CP CP
C P C P
CP CP
C P C P
CP CP
15
Heterogeneous CMPs (Alpha EVx and Cell)
o-o-o
o-o-o
in-o
16
NASCAR Applied to CPUs !?!
• Bullet
Source: Eric Rotenberg (NCSU)
17
Runahead Execution
Single thread in a baseline architecture
Single thread executing in tandem witha helper thread
18
Reliability
P1 C2 P2 C1
SMT core 1 SMT core 2
For power
For performance
19
Title
• Bullet