Upload
afthab99
View
219
Download
0
Embed Size (px)
Citation preview
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
1/63
CS 465Computer ArchitectureFall 2009
Lecture 01: Introduction
Daniel Barbar ( cs.gmu.edu/~dbarbara)[Adapted from Computer Organization and Design,
Patterson & Hennessy, 2005, UCB]
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
2/63
Course Administration
Instructor: Daniel Barbar
[email protected] Eng. Bldg.
Text: Required: Computer Organization & DesignThe Hardware Software Interface, Patterson &Hennessy, the 4th Edition
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
3/63
Grading Information
Grade determinates
Midterm Exam ~25%
Final Exam 1 ~35%
Homeworks ~40%
- Due at the beginning of class (or, if its code to be submittedelectronically, by 17:00 on the due date). No late assignments
will be accepted.
Course prerequisites
grade of C or better in CS 367
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
4/63
Acknowledgements
Slides adopted from Dr. Zhong
Contributions from Dr. Setia
Slides also adopt materials from many other universities
IMPORTANT:
- Slides are not intended as replacement for the text
- You spent the money on the book, please read it!
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
5/63
Course Topics (Tentative)
Instruction set architecture (Chapter 2)
MIPS
Arithmetic operations & data (Chapter 3)
System performance (Chapter 4)
Processor (Chapter 5)
Datapath and control
Pipelining to improve performance (Chapter 6)
Memory hierarchy (Chapter 7)
I/O (Chapter 8)
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
6/63
Focus of the Course
How computers work
MIPS instruction set architecture The implementation of MIPS instruction set architecture MIPS
processor design
Issues affecting modern processors
Pipelining processor performance improvement Cache memory system, I/O systems
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
7/63
Why Learn Computer Architecture?
You want to call yourself a computer scientist
Computer architecture impacts every other aspect of computer science
You need to make a purchasing decision or offer expert advice
You want to build software people use sell many, many copies-(need performance)
Both hardware and software affect performance
- Algorithm determines number of source-level statements
- Language/compiler/architecture determine machine instructions (Chapter 2and 3)
- Processor/memory determine how fast instructions are executed (Chapter 5,6, and 7)
- Assessing and understanding performance(Chapter 4)
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
8/63
Outline Today
Course logistics
Computer architectures overview
Trends in computer architectures
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
9/63
Computer Systems
Software
Application software Word Processors, Email, InternetBrowsers, Games
Systems software Compilers, Operating Systems
Hardware
CPU Memory
I/O devices (mouse, keyboard, display, disks, networks,..)
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
10/63
Operatingsystems
Applicationssoftware
laTEX
Virtualmemory
Filesystem
I/Odevicedrivers
Assemblers
as
Compilers
gcc
Systemssoftware
SoftwareSoftware
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
11/63D.Barbar
instruction set
software
hardware
Instruction Set Architecture
One of the most important abstractions is ISA
A critical interface between HW and SW
Example: MIPS
Desired properties Convenience (from software side)
Efficiency (from hardware side)
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
12/63D.Barbar
What is Computer Architecture Programmers view: a pleasant environment
Operating systems view: a set of resources (hw
& sw)
System architecture view: a set of components
Compilers view: an instruction set architecturewith OS help
Microprocessor architecture view: a set of
functional units
VLSI designers view: a set of transistors
implementing logic
Mechanical engineers view: a heater!
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
13/63
D.Barbar
What is Computer Architecture
Patterson & Hennessy: Computer
architecture = Instruction set architecture
+ Machine organization + Hardware
For this course, computer architecture
mainly refers to ISA (Instruction SetArchitecture)
Programmer-visible, serves as the boundary
between the software and hardwareModern ISA examples: MIPS, SPARC,
PowerPC, DEC Alpha
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
14/63
D.Barbar
Organization and Hardware Organization: high-level aspects of a computers
design Principal components: memory, CPU, I/O,
How components are interconnected
How information flows between components
E.g. AMD Opteron 64 and Intel Pentium 4: same ISA
but different organizations
Hardware: detailed logic design and the
packaging technology of a computer E.g. Pentium 4 and Mobile Pentium 4: nearly identical
organizations but different hardware details
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
15/63
Types of computers and their applications
Desktop
Run third-party software
Office to home applications
30 years old
Servers
Modern version of what used to be called mainframes,
minicomputers and supercomputers Large workloads
Built using the same technology in desktops but higher capacity
- Expandable
- Scalable
- Reliable
Large spectrum: from low-end (file storage, small businesses) tosupercomputers (high end scientific and engineeringapplications)
- Gigabytes to Terabytes to Petabytes of storage
Examples: file servers, web servers, database servers
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
16/63
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
17/63
Where is the Market?
290
93
3
488
114
3
892
135
4
862
129
4
1122
131
50
200
400
600
800
1000
1200
1998 1999 2000 2001 2002
Embedded
Desktop
Servers
MillionsofComputers
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
18/63
In this class you will learn
How programs written in a high-level language (e.g.,Java) translate into the language of the hardware andhow the hardware executes them.
The interface between software and hardware and howsoftware instructs hardware to perform the neededfunctions.
The factors that determine the performance of a program
The techniques that hardware designers employ toimprove performance.
As a consequence, you will understand what features maymake one computer design better than another for aparticular application
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
19/63
High-level to Machine Language
High-level language program
(in C)
Assembly language program
(for MIPS)
Binary machine language program
(for MIPS)
Compiler
Assembler
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
20/63
Evolution
In the beginning there were only bits and people spentcountless hours trying to program in machine language
01100011001 011001110100
Finally before everybody went insane, the assemblerwas invented: write in mnemonics called assembly
language and let the assembler translate (a one to onetranslation)
Add A,B
This wasnt for everybody, obviously (imagine how
modern applications would have been possible inassembly), so high-level language were born (and withthem compilers to translate to assembly, a many-to-onetranslation)
C= A*(SQRT(B)+3.0)
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
21/63
THE BIG IDEA
Levels of abstraction: each layer provides its own(simplified) view and hides the details of the next.
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
22/63
Instruction Set Architecture (ISA)
ISA: An abstract interface between the hardware andthe lowest level software of a machine that encompassesall the information necessary to write a machinelanguage program that will run correctly, includinginstructions, registers, memory access, I/O, and so on.
... the attributes of a [computing] system as seen by theprogrammer, i.e., the conceptual structure and functionalbehavior, as distinct from the organization of the data flows andcontrols, the logic design, and the physical implementation. Amdahl, Blaauw, and Brooks, 1964
Enables implementations of varying cost and performance to runidentical software
ABI (application binary interface): The user portion of theinstruction set plus the operating system interfaces usedby application programmers. Defines a standard forbinary portability across computers.
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
23/63
ISA Type Sales
0
200
400
600
800
1000
1200
1400
1998 1999 2000 2001 2002
Other
SPARCHitachi SH
PowerPC
Motorola 68K
MIPS
IA-32ARM
PowerPoint comic bar chart with approximate values (see
text for correct values)
MillionsofPro
cessor
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
24/63
Organization of a computer
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
25/63
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
26/63
PC Motherboard Closeup
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
27/63
Inside the Pentium 4
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
28/63
Moores Law
In 1965, Gordon Moore predicted that the number of
transistors that can be integrated on a die would doubleevery 18 to 24 months (i.e., grow exponentially withtime).
Amazingly visionary million transistor/chip barrier wascrossed in the 1980s.
2300 transistors, 1 MHz clock (Intel 4004) - 1971
16 Million transistors (Ultra Sparc III)
42 Million transistors, 2 GHz clock (Intel Xeon) 2001 55 Million transistors, 3 GHz, 130nm technology, 250mm2 die
(Intel Pentium 4) - 2004
140 Million transistor (HP PA-8500)
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
29/63
Processor Performance Increase
1
10
100
1000
10000
1987 1989 1991 1993 1995 1997 1999 2001 2003
Year
Performance
(SPECInt)
SUN-4/260 MIPS M/120
MIPS M2000
IBM RS6000
HP 9000/750
DEC AXP/500 IBM POWER 100
DEC Alpha 4/266DEC Alpha 5/500
DEC Alpha 21264/600
DEC Alpha 5/300
DEC Alpha 21264A/667Intel Xeon/2000
Intel Pentium 4/3000
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
30/63
Year
1000
10000
100000
1000000
10000000
100000000
1970 1975 1980 1985 1990 1995 2000
i80386
i4004
i8080
Pentium
i80486
i80286
i8086
CMOS improvements: Die size: 2X every 3 yrs Line width: halve / 7 yrs
Itanium II: 241 millionPentium 4: 55 millionAlpha 21264: 15 millionPentium Pro: 5.5 millionPowerPC 620: 6.9 millionAlpha 21164: 9.3 millionSparc Ultra: 5.2 million
Moores Law
Trend: Microprocessor Capacity
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
31/63
Moores Law
Cramming More Components onto Integrated Circuits
Gordon Moore, Electronics, 1965
# of transistors per cost-effective integrated circuit doubles every 18 months
Transistor capacity doubles every 18-24 months
Speed 2x / 1.5 years (since 85);
100X performance in last decade
Trend: Microprocessor Performance
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
32/63
Trend: Microprocessor Performance
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
33/63
Memory
Dynamic Random Access Memory (DRAM)
The choice for main memory
Volatile (contents go away when power is lost) Fast
Relatively small
DRAM capacity: 2x / 2 years (since 96);64x size improvement in last decade
Static Random Access Memory (SRAM)
The choice for cache Much faster than DRAM, but less dense and more costly
Magnetic disks
The choice for secondary memory
Non-volatile
Slower Relatively large
Capacity: 2x / 1 year (since 97)250X size in last decade
Solid state (Flash) memory
The choice for embedded computers
Non-volatile
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
34/63
Memory
Optical disks
Removable, therefore very large
Slower than disks
Magnetic tape
Even slower
Sequential (non-random) access The choice for archival
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
35/63
DRAM Capacity Growth
10
100
1000
10000
100000
1000000
1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002
Year of introduction
Kbitcapacity
16K
64K
256K
1M
4M
16M
64M128M
256M
512M
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
36/63
Trend: Memory Capacity
size
Year
1000
10000
100000
1000000
10000000
100000000
1000000000
1970 1975 1980 1985 1990 1995 2000
year size (Mbit)
1980 0.0625
1983 0.25
1986 1
1989 4
1992 16
1996 64
1998 1282000 256
2002 512
2006 2048
Now 1.4X/yr, or 2X every 2 years.
more than 10000X since 1980!
Growth of capacity per chip
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
37/63
(Kilo, Mega, Giga, Tera, Peta, Exa, Zetta, Yotta = 1024)
Come up with a clever mnemonic, fame!
Dramatic Technology Change
State-of-the-art PC when you graduate:(at least)
Processor clock speed: 5000 MegaHertz(5.0 GigaHertz)
Memory capacity: 4000 MegaBytes(4.0 GigaBytes)
Disk capacity: 2000 GigaBytes(2.0 TeraBytes)
New units! Mega => Giga, Giga => Tera
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
38/63
Example Machine Organization
Workstation design target
25% of cost on processor
25% of cost on memory (minimum memory size)
Rest on I/O devices, power supplies, box
CPU
Computer
Control
Datapath
Memory Devices
Input
Output
MIPS R3000 I t ti S t A hit t
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
39/63
MIPS R3000 Instruction Set Architecture
Instruction Categories
Load/Store Computational
Jump and Branch
Floating Point
- coprocessor
Memory Management
Special
R0 - R31
PC
HI
LO
OP
OP
OP
rs rt rd sa funct
rs rt immediate
jump target
3 Instruction Formats: all 32 bits wide
Registers
1
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
40/63
Defining Performance
Which airplane has the best performance?
0 100 200 300 400 500
Douglas
DC-8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Passenger Capacity
0 2000 4000 6000 8000 10000
Douglas DC-
8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Cruising Range (miles)
0 500 1000 1500
Douglas
DC-8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Cruising Speed (mph)
0 100000 200000 300000 400000
Douglas DC-
8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Passengers x mph
.4Performa
nce
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
41/63
Response Time and Throughput
Response time
How long it takes to do a task
Throughput
Total work done per unit time
- e.g., tasks/transactions/ per hour
How are response time and throughput affected by
Replacing the processor with a faster version?
Adding more processors?
Well focus on response time for now
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
42/63
Relative Performance
Define Performance = 1/Execution Time
X is ntime faster than Y
n
XY
YX
timeExecutiontimeExecution
ePerformancePerformanc
Example: time taken to run a program
10s on A, 15s on B
Execution TimeB / Execution TimeA
= 15s / 10s = 1.5 So A is 1.5 times faster than B
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
43/63
Measuring Execution Time
Elapsed time
Total response time, including all aspects
- Processing, I/O, OS overhead, idle time
Determines system performance
CPU time
Time spent processing a given job- Discounts I/O time, other jobs shares
Comprises user CPU time and system CPU time
Different programs are affected differently by CPU and systemperformance
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
44/63
CPU Clocking
Operation of digital hardware governed by a constant-rate clock
Clock (cycles)
Data transferand computation
Update state
Clock period
Clock period: duration of a clock cycle
e.g., 250ps = 0.25ns = 2501012s
Clock frequency (rate): cycles per second
e.g., 4.0GHz = 4000MHz = 4.0109Hz
C
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
45/63
CPU Time
Performance improved by
Reducing number of clock cycles
Increasing clock rate
Hardware designer must often trade off clock rate against cyclecount
RateClockCyclesClockCPU
TimeCycleClockCyclesClockCPUTimeCPU
CPU Ti E l
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
46/63
CPU Time Example
Computer A: 2GHz clock, 10s CPU time
Designing Computer B
Aim for 6s CPU time
Can do faster clock, but causes 1.2 clock cycles
How fast must Computer B clock be?
4GHz6s
1024
6s
10201.2RateClock
10202GHz10s
RateClockTimeCPUCyclesClock
6s
CyclesClock1.2
TimeCPU
CyclesClockRateClock
99
B
9
AAA
A
B
BB
I t ti C t d CPI
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
47/63
Instruction Count and CPI
Instruction Count for a program
Determined by program, ISA and compiler
Average cycles per instruction
Determined by CPU hardware
If different instructions have different CPI
- Average CPI affected by instruction mix
RateClock
CPICountnInstructio
TimeCycleClockCPICountnInstructioTimeCPU
nInstructioperCyclesCountnInstructioCyclesClock
CPI E l
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
48/63
CPI Example
Computer A: Cycle Time = 250ps, CPI = 2.0
Computer B: Cycle Time = 500ps, CPI = 1.2
Same ISA
Which is faster, and by how much?
1.2500psI
600psI
ATimeCPU
BTimeCPU
600psI500ps1.2I
BTimeCycle
BCPICountnInstructio
BTimeCPU
500psI250ps2.0I
ATimeCycleACPICountnInstructioATimeCPU
A is faster
by this much
CPI i M D t il
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
49/63
CPI in More Detail
If different instruction classes take different numbers ofcycles
n
1i
ii )CountnInstructio(CPICyclesClock
Weighted average CPI
n
1i
i
i CountnInstructio
CountnInstructio
CPICountnInstructio
CyclesClock
CPI
Relative frequency
CPI E l
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
50/63
CPI Example
Alternative compiled code sequences using instructions in classes A,B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2IC in sequence 2 4 1 1
Sequence 1: IC = 5
Clock Cycles
= 21 + 12 + 23= 10
Avg. CPI = 10/5 = 2.0
Sequence 2: IC = 6
Clock Cycles
= 41 + 12 + 13= 9
Avg. CPI = 9/6 = 1.5
P f S
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
51/63
Performance Summary
Performance depends on
Algorithm: affects IC, possibly CPI
Programming language: affects IC, CPI
Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Tc
The BIG Picture
cycleClock
Seconds
nInstructio
cyclesClock
Program
nsInstructioTimeCPU
P T d
1.5
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
52/63
Power Trends
In CMOS IC technology
5ThePowerWall
FrequencyVoltageloadCapacitivePower 2
100030 5V 1V
Red cing Po er
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
53/63
Reducing Power
Suppose a new CPU has
85% of capacitive load of old CPU
15% voltage and 15% frequency reduction
0.520.85FVC
0.85F0.85)(V0.85CPP 4
old
2
oldold
old
2
oldold
old
new
The power wall
We cant reduce voltage further
We cant remove more heat
How else can we improve performance?
Uniprocessor Performance
1.6
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
54/63
Uniprocessor Performance6TheSea
Change:TheSwitchtoMultiprocessors
Constrained by power, instruction-level parallelism,
memory latency
Multiprocessors
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
55/63
Multiprocessors
Multicore microprocessors
More than one processor per chip
Requires explicitly parallel programming
Compare with instruction level parallelism
- Hardware executes multiple instructions at once
- Hidden from the programmer
Hard to do
- Programming for performance
- Load balancing
- Optimizing communication and synchronization
SPEC CPU Benchmark
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
56/63
SPEC CPU Benchmark
Programs used to measure performance
Supposedly typical of actual workload
Standard Performance Evaluation Corp (SPEC)
Develops benchmarks for CPU, I/O, Web,
SPEC CPU2006
Elapsed time to execute a selection of programs
- Negligible I/O, so focuses on CPU performance
Normalize relative to reference machine
Summarize as geometric mean of performance ratios
- CINT2006 (integer) and CFP2006 (floating-point)
n
n
1i
iratiotimeExecution
CINT2006 for Opteron X4 2356
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
57/63
CINT2006 for Opteron X4 2356
Name Description IC109 CPI Tc (ns) Exec time Ref time SPECratio
perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3
bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8
gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1
mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8
go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6
hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5
sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5
libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8
h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3
omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1
astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1
xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0
Geometric mean 11.7
High cache miss rates
SPEC Power Benchmark
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
58/63
SPEC Power Benchmark
Power consumption of server at different workload levels
Performance: ssj_ops/sec
Power: Watts (Joules/sec)
10
0i
i
10
0i
i powerssj_opsWattperssj_opsOverall
SPECpower ssj2008 for X4
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
59/63
SPECpower_ssj2008 for X4
Target Load % Performance (ssj_ops/sec) Average Power (Watts)
100% 231,867 295
90% 211,282 286
80% 185,803 275
70% 163,427 265
60% 140,160 256
50% 118,324 246
40% 920,35 23330% 70,500 222
20% 47,126 206
10% 23,066 180
0% 0 141
Overall sum 1,283,590 2,605
ssj_ops/ power 493
Pitfall: Amdahls Law
1.8
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
60/63
Pitfall: Amdahl s Law
Improving an aspect of a computer and expecting a proportionalimprovement in overall performance
8Fallacies
andPitfalls
2080
20 n
Cant be done!
unaffectedaffected
improved T
factortimprovemen
TT
Example: multiply accounts for 80s/100s
How much improvement in multiply performance to get 5 overall?
Corollary: make the common case fast
Fallacy: Low Power at Idle
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
61/63
Fallacy: Low Power at Idle
Look back at X4 power benchmark
At 100% load: 295W
At 50% load: 246W (83%)
At 10% load: 180W (61%)
Google data center
Mostly operates at 10% 50% load At 100% load less than 1% of the time
Consider designing processors to make powerproportional to load
Pitfall: MIPS as a Performance Metric
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
62/63
Pitfall: MIPS as a Performance Metric
MIPS: Millions of Instructions Per Second
Doesnt account for
- Differences in ISAs between computers
- Differences in complexity between instructions
66
6
10CPI
rateClock
10rateClock
CPIcountnInstructio
countnInstructio
10timeExecution
countnInstructioMIPS
CPI varies between programs on a given CPU
Concluding Remarks
1.9
7/27/2019 CS465Lec1n.m,,nmnm knlknlknklnklnknknl
63/63
Concluding Remarks
Cost/performance is improving
Due to underlying technology development
Hierarchical layers of abstraction
In both hardware and software
Instruction set architecture
The hardware/software interface
Execution time: the best performance measure
Power is a limiting factor
Use parallelism to improve performance
9ConcludingRemarks