Lecture 25: Wrap-Up

Preview:

DESCRIPTION

Lecture 25: Wrap-Up. Mid-term-II stats: High 91 Mean 73.12 Qs 1-3: half the class got 25/25 Qs 4: only one student got 25/25; almost no one mentioned that we’ll need a mechanism to determine exclusivity Qs 5: highest was 22/30; very few mentioned that allowing - PowerPoint PPT Presentation

Citation preview

1

Lecture 25: Wrap-Up

• Mid-term-II stats: High 91 Mean 73.12

• Qs 1-3: half the class got 25/25

• Qs 4: only one student got 25/25; almost no one mentioned that we’ll need a mechanism to determine exclusivity

• Qs 5: highest was 22/30; very few mentioned that allowing blocks to move would complicate search

2

Example Solutions

3

Example Solutions

4

Example Solutions

5

Example Solutions

6

Example Solutions

7

Example Solutions

8

CPU 2

L1D L1I CP

U 4

L1

DL

1I

CPU 6

L1DL1I

CP

U 1

L1

DL

1I

CPU 3

L1D L1I

CP

U 5

L1

DL

1I

CPU 7

L1DL1I

CP

U 0

L1

DL

1I

9

Tetris?!

10

Non-Uniform Cache Access (NUCA)

• Many open problems in NUCA and D-NUCA How should search happen? Allocation/replacement/migration policies Managing bandwidth/latency on the network Prefetch mechanisms Selective replication of blocks Efficient write-throughs Power/performance trade-offs

• P.S. We have simulators, etc., to help model such caches in case anyone is interested

11

Shameless Plug

• CS 7810: Advanced Architecture

• Lectures based on seminal (and still relevant) papers

• Not much work, apart from class project (in teams)

• Class project can involve as little as 1 week’s worth of concentrated effort…

• … or, enough to get a paper out of it you WILL work on novel problems lots of help from me/other students with the simulator

12

3-D

• Imagine a similar problem in 3D

C P C P

CP CP

C P C P

CP CP

C P C P

CP CP

13

3-D

• Imagine a similar problem in 3D

C P C P

CP CP

C P C P

CP CP

C P C P

CP CP

Must schedule threads to manage temperature

14

Single Thread Performance

• To improve single-thread performance, can even schedule a single thread’s instructions across cores – large window of in-flight instructions to mine high ILP – requires high levels of speculation (power-hungry!) – any solutions?

C P C P

CP CP

C P C P

CP CP

C P C P

CP CP

15

Heterogeneous CMPs (Alpha EVx and Cell)

o-o-o

o-o-o

in-o

16

NASCAR Applied to CPUs !?!

• Bullet

Source: Eric Rotenberg (NCSU)

17

Runahead Execution

Single thread in a baseline architecture

Single thread executing in tandem witha helper thread

18

Reliability

P1 C2 P2 C1

SMT core 1 SMT core 2

For power

For performance

19

Title

• Bullet

Recommended