19
1 Lecture 25: Wrap-Up d-term-II stats: High 91 Mean 73.12 1-3: half the class got 25/25 4: only one student got 25/25; almost no one menti hat we’ll need a mechanism to determine exclusivity 5: highest was 22/30; very few mentioned that allo locks to move would complicate search

Lecture 25: Wrap-Up

Embed Size (px)

DESCRIPTION

Lecture 25: Wrap-Up. Mid-term-II stats: High 91 Mean 73.12 Qs 1-3: half the class got 25/25 Qs 4: only one student got 25/25; almost no one mentioned that we’ll need a mechanism to determine exclusivity Qs 5: highest was 22/30; very few mentioned that allowing - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture 25: Wrap-Up

1

Lecture 25: Wrap-Up

• Mid-term-II stats: High 91 Mean 73.12

• Qs 1-3: half the class got 25/25

• Qs 4: only one student got 25/25; almost no one mentioned that we’ll need a mechanism to determine exclusivity

• Qs 5: highest was 22/30; very few mentioned that allowing blocks to move would complicate search

Page 2: Lecture 25: Wrap-Up

2

Example Solutions

Page 3: Lecture 25: Wrap-Up

3

Example Solutions

Page 4: Lecture 25: Wrap-Up

4

Example Solutions

Page 5: Lecture 25: Wrap-Up

5

Example Solutions

Page 6: Lecture 25: Wrap-Up

6

Example Solutions

Page 7: Lecture 25: Wrap-Up

7

Example Solutions

Page 8: Lecture 25: Wrap-Up

8

CPU 2

L1D L1I CP

U 4

L1

DL

1I

CPU 6

L1DL1I

CP

U 1

L1

DL

1I

CPU 3

L1D L1I

CP

U 5

L1

DL

1I

CPU 7

L1DL1I

CP

U 0

L1

DL

1I

Page 9: Lecture 25: Wrap-Up

9

Tetris?!

Page 10: Lecture 25: Wrap-Up

10

Non-Uniform Cache Access (NUCA)

• Many open problems in NUCA and D-NUCA How should search happen? Allocation/replacement/migration policies Managing bandwidth/latency on the network Prefetch mechanisms Selective replication of blocks Efficient write-throughs Power/performance trade-offs

• P.S. We have simulators, etc., to help model such caches in case anyone is interested

Page 11: Lecture 25: Wrap-Up

11

Shameless Plug

• CS 7810: Advanced Architecture

• Lectures based on seminal (and still relevant) papers

• Not much work, apart from class project (in teams)

• Class project can involve as little as 1 week’s worth of concentrated effort…

• … or, enough to get a paper out of it you WILL work on novel problems lots of help from me/other students with the simulator

Page 12: Lecture 25: Wrap-Up

12

3-D

• Imagine a similar problem in 3D

C P C P

CP CP

C P C P

CP CP

C P C P

CP CP

Page 13: Lecture 25: Wrap-Up

13

3-D

• Imagine a similar problem in 3D

C P C P

CP CP

C P C P

CP CP

C P C P

CP CP

Must schedule threads to manage temperature

Page 14: Lecture 25: Wrap-Up

14

Single Thread Performance

• To improve single-thread performance, can even schedule a single thread’s instructions across cores – large window of in-flight instructions to mine high ILP – requires high levels of speculation (power-hungry!) – any solutions?

C P C P

CP CP

C P C P

CP CP

C P C P

CP CP

Page 15: Lecture 25: Wrap-Up

15

Heterogeneous CMPs (Alpha EVx and Cell)

o-o-o

o-o-o

in-o

Page 16: Lecture 25: Wrap-Up

16

NASCAR Applied to CPUs !?!

• Bullet

Source: Eric Rotenberg (NCSU)

Page 17: Lecture 25: Wrap-Up

17

Runahead Execution

Single thread in a baseline architecture

Single thread executing in tandem witha helper thread

Page 18: Lecture 25: Wrap-Up

18

Reliability

P1 C2 P2 C1

SMT core 1 SMT core 2

For power

For performance

Page 19: Lecture 25: Wrap-Up

19

Title

• Bullet