25
Computer Systems Architecture I CSE 560M Lecture 2 Prof. Patrick Crowley

Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Computer Systems

Architecture I

CSE 560M

Lecture 2

Prof. Patrick Crowley

Page 2: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Plan for Today

• Questions

• Administrivia

• Class background

• Today’s discussion

• Assignment

Page 3: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Administrivia

• My office hours :

– No one “good” time

– By appointment, times available each day

• Shakir’s office hours

– M&W, 5:30pm-6:30pm

– Bryan 422

Page 4: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

2009 Class Background

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Architecture Digital Design VHDL

much

some

none

Page 5: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Introduction

“Speed is not everything but it’s kilometers

ahead of whatever is in second place”

Ed McCreight

The Dragon Computer System

Xerox PARC September, 1984

Page 6: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Computer Design: Make the

Common Case Fast

Amdahl’s law (speedup, S):

tenhancemen with timeExec.

tenhancemen without timeExec.

tenhancemenithout for task w Perf.

tenhancemenith for task w ePerformanc

=

=

S

S

Page 7: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Outline

• Types of computer systems

• Technology trends

• Explaining processor performance improvements

• Performance evaluation

• Fallacies and Pitfalls

Page 8: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Classes of Computer Systems

• Desktop

– Intel IA-32, AMD, IBM PowerPC

• Servers

– Intel IA-32, Intel IA-64, Sun SPARC, AMD

• Embedded

– MIPS, ARM, NEC, Motorola

Page 9: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Classes of Computer Systems

• Consider one metric: worldwide unit sales

• 1980: 724K PCs

• 1986: 9M PCs• 2002: 130M PCs

• 2002: 3500 Intel Itanium processors

• 2002: 36M game consoles

• 2002: 480M mobile phones

• 2004– 178M PCs– 600M mobile phones

• 2006– 230M PCs– 960M mobile phones

• 2008– 299M PCs– 1.2B mobile phones

0

200

400

600

800

1000

1200

2004 2006 2008

PC

Mobile

Phone

Page 10: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Technology Trends

• CPU/Microprocessor– Annual rate of transistor count increase is 55% per year

– Performance improvement has trad. been better than that

• Memory– Density increases, bandwidth increases, access time is stagnant (although new memory architectures help)

• I/O– Disk density: 100% per year!

– Disk access time: 30% in 10 years

– Networks: periodic order of magnitude increases

Page 11: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Computer Generations

Generation Date Technology

1 1950-1959 Vacuum Tubes

2 1960-1968 Transistors

3 1969-1977 Integrated Circuit

4 1978-1999 LSI, VLSI

5 2000-20xx VLSI …

Page 12: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Processor-Memory Perf. Gap

Page 13: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Explaining Processor Improvements• Technology– Faster clock– More transistors

• Architecture– Extensive pipelining

– More transistors enable new functionality• Multiple functional units

• Superscalar execution

• Out-of-order execution

– On-chip caches, TLBs– Instruction fetch units, branch prediction– Multiple cores, thread contexts– Greater on-chip integration

Page 14: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Clock Rate

Chip Date Clock Freq. (MHz)

Clock Period (nanosec)

Intel 8086 1978 4.77 200 Intel 386 1985 40.00 25 Dec Alpha (v1) 1990 100.00 10 Dec Alpha (v2) 1994 300.00 3.33 Intel P4 2002 2,000.00 0.50 Intel Xeon L7455 2008 2,130.00 0.46 Time for signal to Travel 1cm on-chip

~1.00

Page 15: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Intel x86

Progression

Clock rate

stagnates,

cores increase

Chip Date T Count Speed (MHz)

4004 Nov-71 2300 0.108

8008 Apr-72 3500 0.2

8080 Apr-74 6000 2

8086 Jun-78 29000 10

8088 Jun-79 29000 10

286 Feb-82 134000 12.5

386 Oct-85 275000 33

486 Apr-89 1.2M 50

Pentium Mar-93 3.1M 66

Pentium Pro Mar-95 5.5M

166

Pentium II Jul-97 7.5M 300

Pentium III Feb-99 9.5M

1200

Pentium 4 Nov-00 42M 1800

P4-HT Nov-02 188M 3060

Pentium D May-05 169M 2800

Core 2 Duo Jul-06 291M 3000

Xeon L7455 Jul-08 1900M 2130

Xeon E5450 Jan-09 731M 2530

Page 16: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Performance Evaluation Basics

• Performance inversely proportional to execution time

• Elapsed time includes:– User + system; I/O; memory accesses

• CPU time includes:– User + system CPU (no I/O)

• CPU Execution time for a singe program execution:– Cycles Per Instruction (CPI)

timecycleClock CPIcount n Instructio timeCPU ××=

Page 17: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Components of CPI

• Ideal CPI = 1

• Classes of instructions– RISC machines: alu, control flow, f.p., load-store

– CISC machines: string instructions

• We will discuss “contributions to CPI from”:– Memory hierarchy

– Branches (misprediction)

– Pipeline hazards

Page 18: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Components of CPI

∑=

×=n

i

i

iCPI

ICCPI

1 Countn Instructio

Page 19: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Measuring/Modeling CPU

Performance

• Hardware counters on a real CPU

• Instrumented execution of programs running

on a real system

– Binary re-writing

– Debugger

• Instruction set simulation or interpretation

Page 20: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Benchmarks

• Desktop– SPEC, integer and floating-point

– Commercial workloads: SYSmark, Winstone

• Servers– SPEC WEB

– TCP-A,B,C

• Embedded– EEMBC

– Other application-specific suites

Page 21: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Important Future Topics

• Computer Architecture Methodology

• Pipelining

• Locality

Page 22: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Fallacy

“The relative performance of two processors

with the same instruction set architecture

(ISA) can be judged by clock rate or by the

performance of a single benchmark suite.”

Page 23: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Intel P4 (1.7GHz) vs P3 (1GHz)

Page 24: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Pitfall

“Neglecting the cost of software in either

evaluating a system or examining cost-

performance.”

Page 25: Computer Systems Architecture Ipcrowley/cse/560/L2-8-31-2009.pdf · 2009-08-31 · • 2002: 480M mobile phones • 2004 – 178M PCs – 600M mobile phones • 2006 – 230M PCs

Assignment

• Readings– Wednesday• H&P: App. B.1-B.7

• V&L: Ch. 2

• Turner & Zar VHDL concepts tutorial

– Monday• Labor Day, no class meeting