39
ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 www.ece.arizona.edu/~ece462 Instructor Name: Ali Akoglu (ece.arizona.edu/~akoglu) Office: ECE 356-B Phone: (520) 626-5149 Email: [email protected] Office Hours: Tuesdays 11:00 AM – 12:00 PM Thursdays 11:00 AM- 12:00 AM or

ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210www.ece.arizona.edu/~ece462

Instructor

Name: Ali Akoglu (ece.arizona.edu/~akoglu)

Office: ECE 356-B

Phone: (520) 626-5149

Email: [email protected]

Office Hours: Tuesdays 11:00 AM – 12:00 PMThursdays 11:00 AM- 12:00 AM or by appointment

Page 2: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

Computer Architecture

2

Algorithm

Gates/Register-Transfer Level (RTL)

Application

Instruction Set Architecture (ISA)

Operating System/Virtual Machines

Devices

Programming Language

Circuits

Physics

Abstraction Layers

Page 3: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

3

Computer Architecture is Design and Analysis

Design

Analysis

Architecture is an iterative process:• Searching the space of possible designs• At all levels of computer systems

Creativity

Good IdeasGood Ideas

Mediocre IdeasBad Ideas

Cost /PerformanceAnalysis

Page 4: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

Computer Architecture

4

Compatibility

Cost of software development makes compatibility a major force in market

Applications

Technology

Applications suggest how to improve technology, provide revenue to fund development

Improved technologies make new applications possible

Page 5: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

5

Trends: The End of the Uniprocessor Era

Intel cancelled high performance uniprocessor, joined IBM and Sun for multiple processors

Intel cancelled high performance uniprocessor, joined IBM and Sun for multiple processors

Hardware based

Hardware

and softw

are, ILP!

Page 6: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

6

• Old Conventional Wisdom: Power is free, Transistors expensive

• New Conventional Wisdom: “Power wall” Power expensive, Xtors free (Can put more on chip than can afford to turn on)

• Old CW: Sufficiently increasing Instruction Level Parallelism via compilers, innovation (Out-of-order, speculation, VLIW, …)

• New CW: “ILP wall” law of diminishing returns on more HW for ILP

• Old CW: Multiplies are slow, Memory access is fast

• New CW: “Memory wall” Memory slow, multiplies fast (200 clock cycles to DRAM memory, 4 clocks for multiply)

• Old CW: Uniprocessor performance 2X / 1.5 yrs

• New CW: Power Wall + ILP Wall + Memory Wall = Brick Wall• Uniprocessor performance now 2X / 5(?) yrs

• Sea change in chip design: multiple “cores” (2X processors per chip / ~ 2 years)

• More simpler processors are more power efficient

Crossroads: Conventional Wisdom

Page 7: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

7

Instruction Set Architecture: Critical Interface

instruction set

software

hardware

• Properties of a good abstraction– Lasts through many generations (portability)– Used in many different ways (generality)– Provides convenient functionality to higher levels– Permits an efficient implementation at lower levels

Page 8: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

8

ISA vs. Computer Architecture

• Old definition of computer architecture = instruction set design

• Other aspects of computer design called implementation

• Our view is computer architecture >> ISA

• Architect’s job much more than instruction set design; technical hurdles today more challenging than those in instruction set design

• What really matters is the functioning of the complete system

– hardware, runtime system, compiler, operating system, and application

• Computer architecture is not just about transistors, individual instructions, or particular implementations

Page 9: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

9

Course Focus

Understanding the design techniques, machine structures, Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine technology factors, evaluation methods that will determine the form of computers in 21st Centurythe form of computers in 21st Century

Technology ProgrammingLanguages

OperatingSystems History

Applications Interface Design(ISA)

Measurement & Evaluation

Parallelism

Computer Architecture:• Organization• Hardware/Software Boundary Compilers

Page 10: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

10

Related Courses

ECE369ECE369 ECE462/562

ECE462/562

ECE568ECE568

ECE474/574

ECE474/574

Basic computer organization, first look at pipelines + caches

Computer Architecture, First look at parallel

architectures

Parallel Processing

Computer Aided Logic Design, FPGAs

Strong

Prerequisite

ECE 576ECE 576

Computer Based Systems

ECE569ECE569

High Performance Computing, Advanced

Topics

Page 11: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

11

Introduction

• Text for ECE462/562: Hennessy and Patterson’s Computer Architecture, A Quantitative Approach, 5th Edition

• Topics1. Simple machine design (ISAs, microprogramming,

unpipelined machines, Iron Law, simple pipelines)2. Memory hierarchy (DRAM, caches, optimizations) plus

virtual memory systems, exceptions, interrupts3. Complex pipelining (score-boarding, out-of-order issue)4. Explicitly parallel processors (vector machines, VLIW

machines, multithreaded machines)5. Multiprocessor architectures (memory models, cache

coherence, synchronization)

Page 12: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

12

Your ECE462/562

•How would you like your ECE462/562?•Mix of lecture vs. discussion•Depends on how well reading is done before class•Goal is to learn how to do good systems research

•Learn a lot from looking at good work in the past•At commit point, you may chose to pursue your own new idea instead.

Page 13: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

13

Coping with ECE462/562

• Undergrads must have taken ECE274 and ECE369• Grad students with too varied background

• Review Appendix A, B, C• review of ISA, Datapath, Pipelining and Memory

Hierarchy

Page 14: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

14

Policies

• Background: ECE369 or equivalent, based on Patterson and Hennessy’s Computer Organization and Design•Prerequisite: ECE274 & ECE369 & Programming in C•3 to 4 assignments, 2 exams, final project•Grad students: extra exam questions, survey paper and presentation•NO LATE ASSIGNMENTS•Make-ups may be arranged prior to the scheduled activity. •Inquiries about graded material => within 3 days of receiving a grade.    •You are encouraged to discuss the assignment specifications with your instructor, and your fellow students. However, anything you submit for grading must be unique and should NOT be a duplicate of another source. •Read before the class•Participate and ask questions•Manage your time•Start working on assignments early

Page 15: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

Grading

Distribution of Components Grades Scale

Component Percentage Percentage  Grade 

Assignments+Quiz+Participation

35 90-100% A

Exam-I 15 80-89% B

Exam-II 15 70-79% C

Project 35 60-69% D

Total 100 Below 60% E

Page 16: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

16

Introduction

•Assignments and Project Pairs only Who is my partner? (email by 09/06)

• Assignment-0 due 08/28

• Announcements on the web

Page 17: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

17

Research Paper Reading

• As graduate students, you are now researchers• Most information of importance to you will be

in research papers• Ability to rapidly scan and understand research

papers is key to your success

Page 18: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

18

Project (Undergrad vs Grad)

• Transition from undergrad to grad student• ECE wants you to succeed, but you need to show

initiative• pick topic (more on this later)• meet 3 times with faculty to see progress• give oral presentation (grad students only)• written report like conference paper• 3 weeks work full time for 2 people• Opportunity to do “research in the small” to help make

transition from good student to research colleague

Page 19: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

19

Project (Undergrad vs Grad)

• Recreate results from research paper to see• If they are reproducible• If they still hold• Papers from ISCA, HPCA, MICRO, IPDPS, ISC

• Performance evaluation of an architecture• Using industry sponsored tools

• GEM5: gem5.org• Pin: pintool.org• SimpleScalar: simplescalar.com

• A complete end-to-end processor (UGs !!)• Take advantage of FPGAs!!

• Propose your own research project that is related to computer architecture

Page 20: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

20

Measuring Performance

• Topics: (Chapter 1)

Technology trendsPerformance equations

Page 21: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

21

Technology Trends and This Book

20091996When I took this class!

2002 2011Shift to multicore!Reduced emphasis on ILPIntroduce thread level P.

Reduced ILP to 1 chapter!

Request, Data, Thread, Instruction Level

Introduce: GPU, cloud computing, Smart phones, tablets!

Page 22: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

22

Problems

• Algorithms, Programming Languages, Compilers, Operating Systems, Architectures, Libraries, … not ready to supply Thread Level Parallelism or Data Level Parallelism for 1000 CPUs / chip,

• Architectures not ready for 1000 CPUs / chip• Unlike Instruction Level Parallelism, cannot be

solved by just by computer architects and compiler writers alone, but also cannot be solved without participation of computer architects

• 5th Edition Computer Architecture: A Quantitative Approach explores shift from Instruction Level Parallelism to Thread Level Parallelism / Data Level Parallelism

Page 23: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

23

Classes of Parallelism

• In Applications Data Level Parallelism

o Data items that can be operated on concurrently Task-level Parallelism

o Tasks of a work can operate independently • In Hardware

ILP: exploits DLP with compiler, pipelining, speculative execution

Vector Architectures and GPUs: exploit DLP by applying a single instruction to a collection of data

Thread-level parallelism: exploits DLP and TLP, tightly coupled hardware, interaction among threads

Request level parallelism: exploits largely decoupled tasks specified by the programmer

Page 24: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

24

Processor Technology Trends

• Shrinking of transistor sizes: 250nm (1997) 130nm (2002) 65nm (2007) 32nm (2010) 28nm(2011, AMD GPU, Xilinx FPGA) 22nm(2011, Intel Ivy Bridge, die shrink of the Sandy Bridge architecture)

• Transistor density increases by 35% per year and die size increases by 10-20% per year… more cores!

Page 25: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

25

Trends: Historical Perspective

Page 26: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

26

Power Consumption Trends

• Dyn power activity x capacitance x voltage2 x frequency

• Capacitance per transistor and voltage are decreasing, but number of transistors is increasing at a faster rate; hence clock frequency must be kept steady

• Leakage power is also rising

• Power consumption is already between 100-150W in high-performance processors today

3.3GHz Intel core i7: 130 watts

Page 27: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

27

Recent Microprocessor Trends

2004 2010

Source: Micron University Symp.

Transistors: 1.43x / year

Cores: 1.2 - 1.4x

Performance: 1.15x

Frequency: 1.05x

Power: 1.04x

Page 28: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

28

Improving Energy Efficiency Despite Flat Clock Rate

•Turn off the clock of inactive modules• Disable FP unit, core, etc.

•Dynamic Voltage-Frequency Scaling• Periods of low activity, lower the clock rate

•Low power mode• DRAMs lower power mode for extending the battery

•Overclocking• Intel, Turbo mode (2008), chip decides safe clock rate • i7 3.3 GHz, can run in short bursts for 3.6GHz

Page 29: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

29

Modern Processor Today

• Intel Core i7

Clock frequency: 3.2 – 3.33 GHz 45nm and 32nm products Cores: 4 – 6 Power: 95 – 130 W Two threads per core 3-level cache, 12 MB L3 cache Price: $300 - $1000

Page 30: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

30

Other Technology Trends

• DRAM density increases by 40-60% per year, latency has reduced by 33% in 10 years, bandwidth improves twice as fast as latency decreases

• Disk density improves by 100% every year, latency improvement similar to DRAM

Page 31: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

31

First Microprocessor Intel 4004, 1971

• 4-bit accumulator architecture

• 8m pMOS

• 2,300 transistors

• 3 x 4 mm2

• 750kHz clock

• 8-16 cycles/inst.

Page 32: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

32[ Personal Computing Ad, 11/81]

Hardware•Team from IBM building PC prototypes in 1979•Motorola 68000 chosen initially, but 68000 was late•8088 is 8-bit bus version of 8086 => allows cheaper system•Estimated sales of 250,000•100,000,000s sold

Page 33: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

33

• Carried in two tractor trailers, 12 tons + 8 tons

• Built for US Army Signal Corps

DYSEAC, first mobile computer!

Page 34: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

34

Measuring Performance

• Two primary metrics: wall clock time (response time for a program) and throughput (jobs performed in unit time)

• To optimize throughput, must ensure that there is minimal waste of resources

• Performance is measured with benchmark suites: a collection of programs that are likely relevant to the user

SPEC CPU 2006: cpu-oriented programs (for desktops) SPECweb, TPC: throughput-oriented (for servers) EEMBC: for embedded processors/workloads

Page 35: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

35

Performance

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle Inst Count CPI Clock Rate

Program X

Compiler X X

Inst. Set. X X

Organization X X

Technology X

Page 36: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

36

Amdahl’s Law

• Architecture design is very bottleneck-driven – make the common case fast, do not waste resources on a component that has little impact on overall performance/power

• Amdahl’s Law: performance improvements through an enhancement is limited by the fraction of time the enhancement comes into play

Page 37: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

37

Amdahl’s Law

• Considering an enhancement that runs 10 times faster than the original machine but is only usable 40% of the time. • Only 1.56x overall speedup

• An application is “almost all” parallel: 90%. Speedup using • 10 processors => 5.3x• 100 processors => 9.1x• 1000 processors => 9.9x

Page 38: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

38

Principle of Locality

• Most programs are predictable in terms of instructions executed and data accessed

• Temporal locality: a program will shortly re-visit X

• Spatial locality: a program will shortly visit X+1

Page 39: ECE 462/562 Computer Architecture and Design T-Th 12:30-1:45 in HARV210 ece462 Instructor Name:Ali Akoglu (ece.arizona.edu/~akoglu)

39

Exploit Parallelism

• Most operations do not depend on each other – hence, execute them in parallel

• At the circuit level, simultaneously access multiple ways of a set-associative cache

• At the organization level, execute multiple instructions at the same time

• At the system level, execute a different program while one is waiting on I/O