Upload
malkovan
View
1.823
Download
5
Embed Size (px)
DESCRIPTION
Computer Archıtecture Introductıon Notes
Citation preview
Computer Architecture I
Lecture NotesLecture Notes
Dr. Ali Muhtaroğlu
Fall 2009
METU Northern Cyprus Campus
References:Patterson&Hennessy, “Computer Organization and Design” (4th Ed.), Kaufmann, 2008.Stallings, “Computer Organization & Architecture” (7th Ed.), Pearson, 2006.
Mano & Kime, “Logic and Computer Design Fundamentals”, 4th Ed., Prentice Hall, 2008.Brown & Vranesic, “Fund. Of Dig. Logic with VHDL Design” (2nd Ed.), McGraw Hill, 2005.Dr. Patterson’s and Dr. Mary Jane Irwin’s (Penn State) Lecture notes
IntroductionIntroduction
Lecture 1
2Ali Muhtaroğlu
What computers were…
3Ali Muhtaroğlu
EDSAC, University of Cambridge, UK, 1949
Games
Sensor NetsCameras
LaptopsMedia
Players
What computers are…
Robots
Smart phones
Routers
LaptopsPlayers
Servers
CS152-Spring’08 4
SupercomputersAutomobiles
Smart phones
4Ali Muhtaroğlu
What is Computer Architecture ?
Application
Physics
Gap too large to bridge
in one step
(but there are exceptions, e.g.
magnetic compass)
5Ali Muhtaroğlu
Physics
In its broadest definition, computer architecture is the design of
the abstraction layers that allow us to implement information
processing applications efficiently using available manufacturing
technologies.
Abstraction Layers
in Modern Computing Systems
Algorithm
Application
Programming Language
Gates/Register-Transfer Level (RTL)
Instruction Set Architecture (ISA)
Operating System/Virtual Machines
Microarchitecture
Programming Language
Circuits
Original
domain of the
computer
architect
(‘50s-’80s)
Domain of
recent
computer
architecture
(‘90s)
6Ali Muhtaroğlu
Devices
Circuits
Physics
Computer Architecture
vs. Computer Organization
• Architecture is those attributes visible to the
programmer
– Instruction set, number of bits used for data – Instruction set, number of bits used for data
representation, I/O mechanisms, addressing techniques.
– e.g. Is there a multiply instruction?
• Organization is how features are implemented
– Control signals, interfaces, memory technology.
7Ali Muhtaroğlu
– Control signals, interfaces, memory technology.
– e.g. Is there a hardware multiply unit or is it done by
repeated addition?
Caution: You may hear these used interchangeably in the
industry.
Computer Architecture
vs. Computer Organization
• All Intel x86 family share the same basic architecture
• The IBM System/370 family share the same basic architecture• The IBM System/370 family share the same basic architecture
• This provides backwards code compatibility
– Not necessarily forward compatibility
• Organization differs between different versions of computers
8Ali Muhtaroğlu
• Hardware gets cheaper and more compact � Do more in
hardware
• Performance, power dissipation, size factors also drive
changes in organization
Historical: ENIAC - background
• Electronic Numerical
Integrator And Computer
• Eckert and Mauchly
• Decimal (not binary)
• 20 accumulators of 10 digits
• Programmed manually by • Eckert and Mauchly
• University of Pennsylvania
• Trajectory tables for weapons
• Started 1943
• Finished 1946
– Too late for war effort
• Programmed manually by
switches
• 18,000 vacuum tubes
• 30 tons
• 15,000 square feet
• 140 kW power consumption– Too late for war effort
• Used until 1955
• 140 kW power consumption
• 5,000 additions per second
9Ali Muhtaroğlu
von Neumann/Turing
• Stored Program concept
• Main memory storing
programs and data
• ALU operating on binary data• ALU operating on binary data
• Control unit interpreting
instructions from memory and
executing
• Input and output equipment
operated by control unit
• Princeton Institute for
Advanced Studies
– IAS
• Completed 1952
10Ali Muhtaroğlu
IAS - details
• 1000 x 40 bit words
– Binary number
– 2 x 20 bit instructions
• Set of registers (storage in CPU)
– Memory Buffer Register
– Memory Address Register
– Instruction Register
– Instruction Buffer Register
– Program Counter– Program Counter
– Accumulator
– Multiplier Quotient
11Ali Muhtaroğlu
IAS - details
• 1000 x 40 bit words
– Binary number
– 2 x 20 bit instructions
• Set of registers (storage in CPU)
– Memory Buffer Register
– Memory Address Register
– Instruction Register
– Instruction Buffer Register
– Program Counter– Program Counter
– Accumulator
– Multiplier Quotient
12Ali Muhtaroğlu
Components of a Computer
• 5 classic components of a computer
• Input, Output, Memory, Datapath, Control
• Independent of hardware technology
• Represents the past and the present• Represents the past and the present
13Ali Muhtaroğlu
Generations of Computer
• Vacuum tube - 1946-1957
• Transistor - 1958-1964
• Small scale integration - 1965 on
– Up to 100 devices on a chip– Up to 100 devices on a chip
• Medium scale integration - to 1971
– 100-3,000 devices on a chip
• Large scale integration - 1971-1977
– 3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 -1991
– 100,000 - 100,000,000 devices on a chip
• Ultra large scale integration – 1991 -
– Over 100,000,000 devices on a chip
14Ali Muhtaroğlu
Moore’s Law
• Increased density of components on chip
• Gordon Moore – co-founder of Intel
• Number of transistors on a chip will double every year
• Since 1970’s development has slowed a little
– Number of transistors doubles every 18 months
• Cost of a chip has remained almost unchanged
For the same structure/function:
• Higher packing density means shorter electrical paths, giving • Higher packing density means shorter electrical paths, giving
higher performance
• Smaller size gives increased flexibility
• Reduced power and cooling requirements
• Fewer interconnections increases reliability15Ali Muhtaroğlu
Growth in CPU Transistor Count
16Ali Muhtaroğlu
Speeding it up
• Pipelining
• On board cache• On board cache
• On board L1 & L2 cache
• Branch prediction
• Data flow analysis
Speculative execution• Speculative execution
17Ali Muhtaroğlu
Performance Balance
• Processor speed increased
• Memory capacity increased• Memory capacity increased
• Memory speed lags behind processor
speed
18Ali Muhtaroğlu
Logic and Memory Performance Gap
19Ali Muhtaroğlu
Solutions to Logic/Memory Gap
• Increase number of bits retrieved at one time
– Make DRAM “wider” rather than “deeper”
• Change DRAM interface
– Cache
• Reduce frequency of memory access
– More complex cache and cache on chip– More complex cache and cache on chip
• Increase interconnection bandwidth
– High speed buses
– Hierarchy of buses20Ali Muhtaroğlu
Typical I/O Device Data Rates
21Ali Muhtaroğlu
I/O Devices
• Peripherals with intensive I/O demands
• Large data throughput demands
• Processors can handle this• Processors can handle this
• Problem moving data
• Solutions:
– Caching
– Buffering
– Higher-speed interconnection buses
– More elaborate bus structures
– Multiple-processor configurations
22Ali Muhtaroğlu
Improvements in Chip Organization and
Architecture
• Increase hardware speed of processor
– Fundamentally due to shrinking logic gate size
• More gates, packed more tightly, increasing clock rate• More gates, packed more tightly, increasing clock rate
• Propagation time for signals reduced
• Increase size and speed of caches
– Dedicating part of processor chip
• Cache access times drop significantly• Cache access times drop significantly
• Change processor organization and architecture
– Increase effective speed of execution
– Parallelism
23Ali Muhtaroğlu
Problems with Clock Speed and Logic Density
• Power
– Power density increases with density of logic and clock speed
– Dissipating heat
• RC delay
– Speed at which electrons flow limited by resistance and capacitance of
metal wires connecting them
– Delay increases as RC product increases
– Wire interconnects thinner, increasing resistance
– Wires closer together, increasing capacitance
Memory latency• Memory latency
– Memory speeds lag processor speeds
• Solution:
– More emphasis on organizational and architectural approaches
24Ali Muhtaroğlu
Intel Microprocessor Performance
25Ali Muhtaroğlu
Increased Cache Capacity
• Typically two or three levels of cache between
processor and main memoryprocessor and main memory
• Chip density increased
– More cache memory on chip
• Faster cache access
• Pentium chip devoted about 10% of chip area to
cachecache
• Pentium 4 devotes about 50%
26Ali Muhtaroğlu
More Complex Execution Logic
• Enable parallel execution of instructions
• Pipeline works like assembly line• Pipeline works like assembly line
– Different stages of execution of different instructions at
same time along pipeline
• Superscalar allows multiple pipelines within single
processor
– Instructions that do not depend on one another can be
executed in parallelexecuted in parallel
27Ali Muhtaroğlu
Diminishing Returns
• Internal organization of processors complex
– Can get a great deal of parallelism– Can get a great deal of parallelism
– Further significant increases likely to be relatively modest
• Benefits from cache are reaching limit
• Increasing clock rate runs into power dissipation • Increasing clock rate runs into power dissipation
problem
– Some fundamental physical limits are being reached
28Ali Muhtaroğlu
Uniprocessor Performance
1000
10000
Perf
orm
ance (
vs. V
AX
-11/7
80)
??%/year
From Hennessy and Patterson, Computer
Architecture: A Quantitative Approach, 4th
edition, October, 2006
10
100
1000
Perf
orm
ance (
vs. V
AX
-11/7
80)
25%/year
52%/year
29Ali Muhtaroğlu
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
• VAX : 25%/year 1978 to 1986• RISC + x86: 52%/year 1986 to 2002• RISC + x86: ??%/year 2002 to present
Uniprocessor Performance
1000
10000
Perf
orm
ance (
vs. V
AX
-11/7
80)
??%/year
From Hennessy and Patterson, Computer
Architecture: A Quantitative Approach, 4th
edition, October, 2006
The End of Uniprocessor Era for High Performance
Computing !
10
100
1000
Perf
orm
ance (
vs. V
AX
-11/7
80)
25%/year
52%/year
30Ali Muhtaroğlu
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
• VAX : 25%/year 1978 to 1986• RISC + x86: 52%/year 1986 to 2002• RISC + x86: ??%/year 2002 to present
Déjà vu all over again?
• Multiprocessors imminent in 1970s, ‘80s, ‘90s, …
• “… today’s processors … are nearing an impasse as technologies approach the speed of light..”
David Mitchell, The Transputer: The Time Is Now (1989)David Mitchell, The Transputer: The Time Is Now (1989)
• Transputer was premature
⇒ Custom multiprocessors tried to beat uniprocessors
⇒ Procrastination rewarded: 2X seq. perf. / 1.5 years
• “We are dedicating all of our future product development to multicoredesigns. … This is a sea change in computing”
Paul Otellini, President, Intel (2004)
• Difference is all microprocessor companies have switched to
31Ali Muhtaroğlu
• Difference is all microprocessor companies have switched to multiprocessors (AMD, Intel, IBM, Sun; all new Apples 2+ CPUs)
⇒ Procrastination penalized: 2X sequential perf. / 5 yrs
⇒ Biggest programming challenge: from 1 to 2 CPUs
Multiple Cores
• Multiple processors on single chip
– Large shared cache
• Within a processor, increase in performance proportional to square root of increase in complexity
• Within a processor, increase in performance proportional to square root of increase in complexity
• If software can use multiple processors, doubling number of processors almost doubles performance
• So, use two simpler processors on the chip rather than one more complex processor
• With two processors, larger caches are justified
– Power consumption of memory logic less than processing logic– Power consumption of memory logic less than processing logic
• Example: IBM POWER4
– Two cores based on PowerPC
32Ali Muhtaroğlu
Some physics…• Dynamic power dissipation is CV2f
Where C is the effective switching capacitance while running an application
f is the switching frequency
V is the operating voltage
Some math…
• Assume a single core processor C = 10 nF, V=1.2 V, f = 2 GHz
Power = CV2f = 28.8 W
What if we get to 2.4 GHz by increasing V to 1.3 V.
V is the operating voltage
• Also f is roughly proportional to V
• What if we get to 2.4 GHz by increasing V to 1.3 V.
Power = 40.6 W
• What if we have 2 core processor operating at 1.6 GHz and 1.1 V
Power = 38.7 W
• What if we now build the original chip to be less complex i.e. smaller C ?
33Ali Muhtaroğlu
POWER4 Chip Organization
34Ali Muhtaroğlu
Problems with “sea change”?
• Algorithms, Programming Languages, Compilers,
Operating Systems, Architectures, Libraries, … not
ready to supply Thread-Level Parallelism or Data-
Level Parallelism for 1000 CPUs / chip, Level Parallelism for 1000 CPUs / chip,
• Architectures not ready for 1000 CPUs / chip
– Unlike Instruction-Level Parallelism, cannot be solved by
computer architects and compiler writers alone, but also
cannot be solved without participation of architects
35Ali Muhtaroğlu
cannot be solved without participation of architects
• Need a reworking of all the abstraction layers in the
computing system stack
Algorithm
Application
Programming Language Parallel
Abstraction Layers
in Modern Computing Systems
Gates/Register-Transfer Level (RTL)
Instruction Set Architecture (ISA)
Operating System/Virtual Machines
Microarchitecture
Programming Language
Circuits
Original
domain of
the computer
architect
(‘50s-’80s)
Domain of
recent
computer
architecture
(‘90s)
Reliability,
power, …
Parallel computing,
security, …
36Ali Muhtaroğlu
Devices
Physics
power, …
Reinvigoration of
computer architecture,
mid-2000s onward.
EEE-445
Our goal will be to acquire a good understanding of:
- Computer System components
- Instruction Set Architecture (ISA) design
- Single-Cycle and Multi-Cycle hardware organization/design to
support an ISA throughsupport an ISA through
- We will do limited exercises on defining hardware through
- Software emulators
- VHDL/schematic capture description and simulation
EEE-446 (next term): will complete the missing theoretical pieces to
obtain a solid Computer Architecture background:
- Bit slicing
37Ali Muhtaroğlu
- Bit slicing
- Arithmetic algorithms and how they are implemented in hardware
- More advanced Memory and I/O concepts
- Pipelining and parallel processing concepts
In addition you will get some practical experience in the lab by applying
what you learnt in EEE-445/446 sequence to design your own CPU.