Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Computer Systems
Architecture I
CSE 560M
Lecture 2
Prof. Patrick Crowley
Plan for Today
• Questions
• Administrivia
• Class background
• Today’s discussion
• Assignment
Administrivia
• My office hours :
– No one “good” time
– By appointment, times available each day
• Shakir’s office hours
– M&W, 5:30pm-6:30pm
– Bryan 422
2009 Class Background
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Architecture Digital Design VHDL
much
some
none
Introduction
“Speed is not everything but it’s kilometers
ahead of whatever is in second place”
Ed McCreight
The Dragon Computer System
Xerox PARC September, 1984
Computer Design: Make the
Common Case Fast
Amdahl’s law (speedup, S):
tenhancemen with timeExec.
tenhancemen without timeExec.
tenhancemenithout for task w Perf.
tenhancemenith for task w ePerformanc
=
=
S
S
Outline
• Types of computer systems
• Technology trends
• Explaining processor performance improvements
• Performance evaluation
• Fallacies and Pitfalls
Classes of Computer Systems
• Desktop
– Intel IA-32, AMD, IBM PowerPC
• Servers
– Intel IA-32, Intel IA-64, Sun SPARC, AMD
• Embedded
– MIPS, ARM, NEC, Motorola
Classes of Computer Systems
• Consider one metric: worldwide unit sales
• 1980: 724K PCs
• 1986: 9M PCs• 2002: 130M PCs
• 2002: 3500 Intel Itanium processors
• 2002: 36M game consoles
• 2002: 480M mobile phones
• 2004– 178M PCs– 600M mobile phones
• 2006– 230M PCs– 960M mobile phones
• 2008– 299M PCs– 1.2B mobile phones
0
200
400
600
800
1000
1200
2004 2006 2008
PC
Mobile
Phone
Technology Trends
• CPU/Microprocessor– Annual rate of transistor count increase is 55% per year
– Performance improvement has trad. been better than that
• Memory– Density increases, bandwidth increases, access time is stagnant (although new memory architectures help)
• I/O– Disk density: 100% per year!
– Disk access time: 30% in 10 years
– Networks: periodic order of magnitude increases
Computer Generations
Generation Date Technology
1 1950-1959 Vacuum Tubes
2 1960-1968 Transistors
3 1969-1977 Integrated Circuit
4 1978-1999 LSI, VLSI
5 2000-20xx VLSI …
Processor-Memory Perf. Gap
Explaining Processor Improvements• Technology– Faster clock– More transistors
• Architecture– Extensive pipelining
– More transistors enable new functionality• Multiple functional units
• Superscalar execution
• Out-of-order execution
– On-chip caches, TLBs– Instruction fetch units, branch prediction– Multiple cores, thread contexts– Greater on-chip integration
Clock Rate
Chip Date Clock Freq. (MHz)
Clock Period (nanosec)
Intel 8086 1978 4.77 200 Intel 386 1985 40.00 25 Dec Alpha (v1) 1990 100.00 10 Dec Alpha (v2) 1994 300.00 3.33 Intel P4 2002 2,000.00 0.50 Intel Xeon L7455 2008 2,130.00 0.46 Time for signal to Travel 1cm on-chip
~1.00
Intel x86
Progression
Clock rate
stagnates,
cores increase
Chip Date T Count Speed (MHz)
4004 Nov-71 2300 0.108
8008 Apr-72 3500 0.2
8080 Apr-74 6000 2
8086 Jun-78 29000 10
8088 Jun-79 29000 10
286 Feb-82 134000 12.5
386 Oct-85 275000 33
486 Apr-89 1.2M 50
Pentium Mar-93 3.1M 66
Pentium Pro Mar-95 5.5M
166
Pentium II Jul-97 7.5M 300
Pentium III Feb-99 9.5M
1200
Pentium 4 Nov-00 42M 1800
P4-HT Nov-02 188M 3060
Pentium D May-05 169M 2800
Core 2 Duo Jul-06 291M 3000
Xeon L7455 Jul-08 1900M 2130
Xeon E5450 Jan-09 731M 2530
Performance Evaluation Basics
• Performance inversely proportional to execution time
• Elapsed time includes:– User + system; I/O; memory accesses
• CPU time includes:– User + system CPU (no I/O)
• CPU Execution time for a singe program execution:– Cycles Per Instruction (CPI)
timecycleClock CPIcount n Instructio timeCPU ××=
Components of CPI
• Ideal CPI = 1
• Classes of instructions– RISC machines: alu, control flow, f.p., load-store
– CISC machines: string instructions
• We will discuss “contributions to CPI from”:– Memory hierarchy
– Branches (misprediction)
– Pipeline hazards
Components of CPI
∑=
×=n
i
i
iCPI
ICCPI
1 Countn Instructio
Measuring/Modeling CPU
Performance
• Hardware counters on a real CPU
• Instrumented execution of programs running
on a real system
– Binary re-writing
– Debugger
• Instruction set simulation or interpretation
Benchmarks
• Desktop– SPEC, integer and floating-point
– Commercial workloads: SYSmark, Winstone
• Servers– SPEC WEB
– TCP-A,B,C
• Embedded– EEMBC
– Other application-specific suites
Important Future Topics
• Computer Architecture Methodology
• Pipelining
• Locality
Fallacy
“The relative performance of two processors
with the same instruction set architecture
(ISA) can be judged by clock rate or by the
performance of a single benchmark suite.”
Intel P4 (1.7GHz) vs P3 (1GHz)
Pitfall
“Neglecting the cost of software in either
evaluating a system or examining cost-
performance.”
Assignment
• Readings– Wednesday• H&P: App. B.1-B.7
• V&L: Ch. 2
• Turner & Zar VHDL concepts tutorial
– Monday• Labor Day, no class meeting