42
Computer Science 246 Advanced Computer Architecture Spring 2010 Harvard University Instructor: Prof. David Brooks [email protected]

Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

Computer Science 246Advanced Computer

ArchitectureSpring 2010

Harvard University

Instructor: Prof. David [email protected]

Page 2: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

2

Course Outline• Instructor• Prerequisites• Topics of Study• Course Expectations• Grading• Class Scheduling Conflicts

Page 3: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

3

Instructor• Instructor: David Brooks

([email protected])• Office Hours: after class M/W, MD141, else stop

by/email whenever

Page 4: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

4

Prerequisites• CS141 (or equivalent)

• Logic Design (computer arithmetic)• Basic RISC ISA• Pipelining (control/data hazards, forwarding)• i.e. Hennessey & Patterson Jr. (HW/SW Interface)

• C Programming, UNIX for Project (or similar skills)

• Compilers, OS, Circuits/VLSI background is a plus, not needed

Page 5: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

5

Readings and Resources• Text: “Computer Architecture: A Quantitative

Approach,” Hennessy and Patterson• Key research papers (available on the web)• SimpleScalar toolset

• Widely used architectural simulator• SPEC2000 benchmarks• Will be used for some HW/Projects• Power/thermal modeling extensions available

Page 6: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

6

Course Expectations• Lecture Material

• Short homework/quizzes to cover this material• Seminar style part of course:

• Expectation: you will read the assigned papers before class so we can have a lively discussion about them

• Paper reviews –short “paper review” highlighting interesting points, strengths/weaknesses of the paper

• Bring discussion points to class• Discussion leadership – Students will be assigned

to present the paper/lead the discussions

Page 7: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

7

Course Expectations• Course project

• Several possible ideas will be given• Also you may come up with your own• Schedule meetings with groups (1/2hr per project)

to discuss results/progress• There will be two presentations

–First, a short “progress update” (mid April)–Second, a final presentation scheduled at the

end of reading week–Finally, a project writeup written in research

paper style

Page 8: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

8

Grading• Grade Formula

• Homeworks and/or Quiz – 25%• Class Participation – 25%• Project (including final project presentation)– 50%

Page 9: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

9

Topics of CS246• Introduction to Computer Architecture and Power-

Aware Computing• Modern CPU Design

• Deep pipelines/Multiple Issue• Dynamic Scheduling/Speculative Execution• Memory Hierarchy Design

• Pentium Architecture Case Study• Multiprocessors and Multithreading• Embedded computing devices• Dynamic Frequency/Voltage Scaling• Thermal-aware processor design• Power-related reliability issues• System-Level Power Issues• Software approaches power management

Page 10: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

10

Readings from Previous Years• High-level Power modeling (power abstractions)• Power Measurement for OS control• Temperature Aware Computing• Di/Dt Modeling• Leakage Power Modeling and Control• Frequency and Voltage Scheduling Algorithms• Power in Data Centers, Google Cluster Architecture• Dynamic Power Reduction in Memory• Disk Power Modeling/Management• Application and Compiler Power Management• Dynamic adaptation for Multimedia applications• Architectures for wireless sensor devices• Low-power routing algorithms for wireless networked devices• Intel XScale Microprocessor, IBM Watchpad• Human Powered Computing

Page 11: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

11

Why take CS246?• Learn how modern computing hardware

works • Understand where computing hardware is

going in the future• And learn how to contribute to this future…

• How does this impact system software and applications?

• Essential to understand OS/compilers/PL• For everyone else, it can help you write better

code!• How are future technologies going to impact

computing systems?

Page 12: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

12

Architectural Inflection Point• Computer architects strive to give maximum

performance with programmer abstraction• Compilers, OS part of this abstraction• e.g. Pipelining, superscalar, speculative

execution, branch prediction, caching, virtual memory…

• Technology has brought us to an inflection point

• Multiple processors on a single chip -- Why?–Design complexity, ILP/pipelining-limits, power

dissipation, etc• How to provide the abstraction?• Some burden will shift back to programmers

Page 13: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

13

Topics of Study• Focus on what modern computer architects

worry about (both academia and industry)• Get through the basics of modern processor

design• Look at technology trends: multithreading,

CMP, power-, reliability-aware design• Recent research ideas, and the future of

computing hardware

Page 14: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

14

ApplicationTrends

What is Computer Architecture?

Prog. Lang,Compilers

OperatingSystem

Applications(AI, DB, Graphics)

Instruction Set ArchitectureMicroarchitecture

System Architecture

VLSI/Hardware Implementations

TechnologyTrends

Hardware

Software

Page 15: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

15

Trends in Computing…

Technology Constraints•Power•Design variations•Design costs

Technology Advances•2x Transistors / generation•New memory technologies•3D stacking

Software Trends•Runtime software•Virtual machines•Multi-threaded

Application Trends•Mobile and embedded•Interactions with physical-world•Data-centric computing

Page 16: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

16

Application Areas• General-Purpose Laptop/Desktop

• Productivity, interactive graphics, video, audio• Optimize price-performance• Examples: Intel Pentium 4, AMD Athlon XP

• Embedded Computers• PDAs, cell-phones, sensors => Price, Energy

efficiency, Form-Factor• Examples: Intel XScale, StrongARM (SA-110)• Game Machines, Network uPs => Price-

Performance• Examples: Sony Emotion Engine, IBM 750FX

Page 17: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

17

Application Areas• Commercial Servers

• Database, transaction processing, search engines• Performance, Availability, Scalability• Server downtime could cost a brokerage company

more than $6M/hour• Examples: Sun Fire 15K, IBM p690, Google Cluster

• Scientific Applications• Protein Folding, Weather Modeling, CompBio,

Defense• Floating-point arithmetic, Huge Memories• Examples: IBM DeepBlue, BlueGene, Cray T3E,

etc.

Page 18: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

18

A Bit of History…

[Source: Hennessy & Patterson, 4th ed.]

1975 1980 1985 1990 1995 2000 2005 2010 2015 2020

100

102

104

106

Year

Nor

mal

ized

CP

U P

erfo

rman

ce

Historical Trend: 1.58x/yr

Page 19: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

19

Moore’s Transistors

[Source: Bowman, Intel]

Desktop CPUs

Every 18-24 monthsFeature sizes shrink by 0.7xNumber of transistors per die increases by 2xSpeed of transistors increases by 1.4x

Page 20: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

20

How have we used these transistors?• More functionality on one chip

• Early 1980s – 32-bit microprocessors• Late 1980s – On Chip Level 1 Caches• Early/Mid 1990s – 64-bit microprocessors, superscalar

(ILP) • Late 1990s – On Chip Level 2 Caches• Early/Mid 2000s – Chip Multiprocessors, On Chip

Level 3 Caches• What is next?

• How much more cache can we put on a chip? (Itanium2)

• How many more cores can we put on a chip? (Niagara, etc)

• What else can we put on chips?

Page 21: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

21

Performance vs. Technology Scaling

• Architectural Innovations• Massive pipelining (good and bad!)• Huge caches• Branch Prediction, Register Renaming, OOO-

issue/execution, Speculation (hardware/software versions of all of these)

• Circuit/Logic Innovations• New logic circuit families (dynamic logic)• Better CAD tools• Advanced computer arithmetic

Page 22: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

22

Constant-Field (Dennard) Scaling

• Traditional scaling has stopped

• When Vdd scaling slows, so does Energy scaling

E = CV2

[Source: Nowak, IBM]

Page 23: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

23

The power problem

• Must design with strict power envelopes– 130W servers, 65W desktop, 10-30W laptops, 1-

2W hand-held

386 486

Pentium

P6

Pentium 2

Pentium 3

Conroe

Prescott

Pentium 4

“Nuclear Reactor”

Hot Plate

Power CeilingPower Ceiling

Page 24: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

24

1975 1980 1985 1990 1995 2000 2005 2010 2015 2020

100

102

104

106

Year

Nor

mal

ized

CP

U P

erfo

rman

ce

Historical Trend: 1.58x/yr

1975 1980 1985 1990 1995 2000 2005 2010 2015 2020

100

102

104

106

Year

Nor

mal

ized

CP

U P

erfo

rman

ceChallenges in the 21st Century

Page 25: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

25

Intel’s Technology Outlook

Medium High Very HighMedium High Very HighVariabilityVariability

Energy scaling will slow downEnergy scaling will slow down>0.5>0.5>0.5>0.5>0.35>0.35Energy/Logic Op Energy/Logic Op scalingscaling

0.5 to 1 layer per generation0.5 to 1 layer per generation88--9977--8866--77Metal LayersMetal Layers

1111111111111111RC DelayRC DelayReduce slowly towards 2Reduce slowly towards 2--2.52.5<3<3~3~3ILD (K)ILD (K)

Low Probability High ProbabilitLow Probability High ProbabilityyAlternate, 3G etcAlternate, 3G etc

128

1111

20162016

High Probability Low ProbabilitHigh Probability Low ProbabilityyBulk Planar CMOSBulk Planar CMOS

Delay scaling will slow downDelay scaling will slow down>0.7>0.7~0.7~0.70.70.7Delay = CV/I Delay = CV/I scalingscaling

256643216842Integration Integration Capacity (BT)Capacity (BT)

88161622223232454565659090Technology Node Technology Node (nm)(nm)

20182018201420142012201220102010200820082006200620042004High Volume High Volume ManufacturingManufacturing

Medium High Very HighMedium High Very HighVariabilityVariability

Energy scaling will slow downEnergy scaling will slow down>0.5>0.5>0.5>0.5>0.35>0.35Energy/Logic Op Energy/Logic Op scalingscaling

0.5 to 1 layer per generation0.5 to 1 layer per generation88--9977--8866--77Metal LayersMetal Layers

1111111111111111RC DelayRC DelayReduce slowly towards 2Reduce slowly towards 2--2.52.5<3<3~3~3ILD (K)ILD (K)

Low Probability High ProbabilitLow Probability High ProbabilityyAlternate, 3G etcAlternate, 3G etc

128

1111

20162016

High Probability Low ProbabilitHigh Probability Low ProbabilityyBulk Planar CMOSBulk Planar CMOS

Delay scaling will slow downDelay scaling will slow down>0.7>0.7~0.7~0.70.70.7Delay = CV/I Delay = CV/I scalingscaling

256643216842Integration Integration Capacity (BT)Capacity (BT)

88161622223232454565659090Technology Node Technology Node (nm)(nm)

20182018201420142012201220102010200820082006200620042004High Volume High Volume ManufacturingManufacturing

[Source: Borkar, Intel]

Page 26: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

26

Parameter Variations

• Impacts maximum clock frequency and/or power• Worst-case delay effected by variations• Combined effects can be significant in current

generations• Most issues are more serious with technology scaling

Variations

Process

Runtime

Voltage Temperature

Page 27: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

27

Process Variations

Within-Die (WID)

Correlated

Die-to-Die (D2D)

Feature ScaleDie Scale

Random

Wafer Scale

[Source: Bowman, Intel]

RDF & LER

Page 28: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

28

Voltage Variations:Temporal

1.00

1.05

1.10

1.15

1.20

1.25

0 10 20 30 40 50

Time (ns)

V CC (V

)

VMAX: Reliability

VMIN: Performance

Margin Inefficiency

Page 29: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

29

1 core active, 3 cores idle 3 cores active, 1 core idle

Voltage Variations: Spatial + Temporal

Page 30: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

30

Power-Aware Computing Applications

Energy-Constrained Computing

Tem

pera

ture

/di-d

t-Con

stra

ined

Page 31: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

31

Where does the juice go in laptops?

• Others have measured ~55% processor increase under max load in laptops [Hsu+Kremer, 2002]

Page 32: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

32

Packaging costFrom Cray (local power generator and refrigeration)…

Source: Gordon Bell, “A Seymour Cray perspective”http://www.research.microsoft.com/users/gbell/craytalk/

Page 33: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

33

Packaging costTo today…• IBM S/390: refrigeration:

• Provides performance (2% perf for 10ºC) and reliability

Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling”IBM Journal of R&D

Page 34: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

34

Intel Itanium packaging

Complex and expensive (note heatpipe)

Source: H. Xie et al. “Packaging the Itanium Microprocessor”Electronic Components and Technology Conference 2002

Page 35: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

35

P4 packaging

• Simpler, but still…

Source: Intel web site

0

10

20

30

40

0 10 20 30 40Power (Watts)

Tota

l Pow

er-R

elat

edP

C S

yste

m C

ost

($)

From Tiwari, et al., DAC98

Page 36: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

36

Page 37: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

37

Cooking Aware Computing

Page 38: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

38

Server Farms• Internet data centers are like heavy-duty

factories• e.g. small Datacenter 25,000 sq.feet, 8000 servers,

2MegaWatts• Intergate Datacenter, Tukwila, WA: 1.5 Mill. Sq.Ft,

~500 MW• Wants lowest net cost per server per sq foot of data

center space• Cost driven by:

• Racking height• Cooling air flow• Power delivery• Maintenance ease (access, weight)• 25% of total cost due to power

Page 39: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

39

Environment• Environment Protection Agency (EPA): computers

consume 10% of commercial electricity consumption• This incl. peripherals, possibly also manufacturing• A DOE report suggested this percentage is much lower (3.0-

3.5%)• No consensus, but it’s still a lot• Interesting to look at the numbers:

– http://enduse.lbl.gov/projects/infotech.html

• Data center growth was cited as a contribution to the 2000/2001 California Energy Crisis

• Equivalent power (with only 30% efficiency) for AC• CFCs used for refrigeration• Lap burn• Fan noise

Page 40: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

40

Now we know why power is important• What can we do about it?

• Two components to the problem:• #1: Understand where and why power is dissipated• #2: Think about ways to reduce it at all levels of

computing hierarchy• In the past, #1 is difficult to accomplish except at

the circuit level• Consequently most low-power efforts were all

circuit related

Page 41: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

41

Modeling + Design

• First Component (Modeling/Measurement):• Come up with a way to:

–Diagnose where power is going in your system–Quantify potential savings

• Second Component (Design)• Try out lots of ideas• Or characterize tradeoffs of ideas…

• This class will focus on both of these at many levels of the computing hierarchy

Page 42: Computer Science 246 Advanced Computer Architecturedbrooks/cs246/cs246-lecture-intro.pdf · Advanced Computer Architecture Spring 2010 Harvard University ... • Pentium Architecture

42

Next Time• Course website:

http://www.eecs.harvard.edu/~dbrooks/cs246• Background reading for the first week• Questions?