View
222
Download
0
Category
Preview:
Citation preview
Structure – Top Level
Computer
Main
Memory
Input
Output
Systems
Interconnection
Peripherals
Communication
lines
Central
Processing
Unit
Computer
Computer Arithmetic
and
Logic Unit
Control
Unit
Internal CPU
Interconnection
Registers
CPU
I/O
Memory
System
Bus
CPU
Structure - CPU
CPU
• CPU – controls the operation of the computer
• Components of CPU
– Control Unit – control the operation of the CPU
– Arithmetic Logic Unit (ALU) – performs data processing function e.g. calculation
– Internal CPU Interconnection – provides communication between control unit, registers and ALU.
CPU
Control
Memory
Control Unit
Registers and
Decoders
Sequencing
Logic
Control
Unit
ALU
Registers
Internal
Bus
Control Unit
Structure - Control Unit
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
Function of Control Unit
• For each operation a unique code is provided
—e.g. ADD, MOVE
• A hardware segment accepts the code and issues the control signals
• We have a computer!
9 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
Components
• The Control Unit and the Arithmetic and Logic Unit (ALU) constitute the Central Processing Unit (CPU)
• Data and instructions need to get into the system and results out
—Input/output
• Temporary storage of code and results is needed
—Main memory
10 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
Computer Components:
Top Level View
12 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
How Instruction is Executed?
• What is instruction?
— Instruction specify the action that the processor is suppose to take.
• The processing required for a single instruction is called an instruction cycle.
• Instruction cycle are made of these two steps:
— Fetch (processor reads from memory and also referred to as fetch cycle)
— Execute (Also referred to as execute cycle)
13 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
Fetch Cycle
• Program Counter (PC) holds address of next instruction to fetch
• Processor fetches instruction from memory location pointed to by PC
• Increment PC
—Unless told otherwise
• Instruction loaded into Instruction Register (IR)
• Processor interprets instruction and performs required actions
14 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
Execute Cycle
• An instruction’s execution (execute cycle) may involve one or a combination of these actions
—Processor-memory
– Data transfer between CPU and main memory
—Processor I/O
– Data transfer between CPU and I/O module
—Data processing
– Some arithmetic or logical operation on data
—Control
– Alteration of operations’ sequences
15 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
Instruction Format
• Assume both instructions and data are 16 bits (2 bytes) long.
• The instruction format provides 4 bytes for the opcode, so that there can be as many as 24 = 16 different opcodes and up to 212 words of memory can be directly addressed.
Instruction format
Integer format 16 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
What is Word, Half-Word and Double
Word?
• A "word," in computing, is a standard memory size used for data storage. The most popular word sizes for modern computers is 16, 32, or 64 bits.
• Some systems or programming languages do not declare specific sizes for variables and use "word," "half-word" and "double word" to describe how much storage space you are allocating.
• This means that if you have a system with a 32 bit word size, and you declare a double word integer, you have declared a 64 bit integer.
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
Example of Program Execution
Internal CPU Registers
PC (Program Counter)
AC (Accumulator)
– a data register
IR (Instruction Register)
Program to be executed:
Adds the content of the
memory word at address
940 to the content of the
memory word address
941 and stores the result
in latter location.
(Assume a word=16 bits/2
bytes)
18 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
(cont.) Example of Program
Execution
Requires 3 fetch and 3 execute
cycles.
1. {1st Fetch cycle} The PC contains 300, the address of the first instruction. This instruction (the value 1940 in hexadecimal) is loaded into the instruction register IR and the PC is incremented. Note that this process involves the use of a memory address register (MAR) and a memory buffer register (MBR). For simplicity these intermediate registers are ignored.
NOTE: The number used in this example is in
hexadecimal e.g. 0x1940.
19 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
(cont.) Example of Program
Execution
2. {1st Execute cycle} The first
4 bits (first hexadecimal digit) in the IR indicate that the AC is to be loaded. The remaining 12 bits (3 hexadecimal digits) specify the address (940) from which data are to be loaded.
20 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
(cont.) Example of Program
Execution
3. {2nd Fetch cycle} The next instruction (5941) is fetched from location 301 and the PC is incremented.
21 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
(cont.) Example of Program
Execution
4. {2nd Execute cycle} The old content of the AC and the content of location 941 are added and the result is stored in the AC.
22 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
(cont.) Example of Program
Execution
5. {3rd Fetch cycle} The next instruction (2941) is fetched from location 302 and the PC is incremented.
23 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
(cont.) Example of Program
Execution
6. {3rd Execute cycle} The content of AC is stored in location 941.
24 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM BIT20303-Computer Architecture 25
1,10 2
3
4,11
5 6
7
8 9
Location
• Inside CPU (e.g. Registers)
• Internal (inside the computer e.g. RAM, Level 1 or L1 cache, L2 cache, L3 cache)
• External (outside of the computer e.g. Hard disks, SSD, removable drives)
A Modern Memory Hierarchy
28
Register File 32 words, sub-nsec
L1 cache ~32 KB, ~nsec
L2 cache 512 KB ~ 1MB, many nsec
L3 cache, .....
Main memory (DRAM), GB, ~100 nsec
Swap Disk 100 GB, ~10 msec
manual/compiler register spilling
automatic demand paging
Automatic HW cache management
Memory Abstraction
How to access memory location?
• Random (e.g. RAM) – individual address identify locations exactly
• Direct (e.g. hard disk) – Each block has unique address; access by jumping to specific block plus sequential search
• Associative (e.g. cache) – data is retrieved based on the portion of its contents rather than its address
• Sequentially (e.g. tape) – start from the beginning of the tape; access time depends on location of data and previous location.
Memory Technology: DRAM • Dynamic random access memory
• Capacitor charge state indicates stored value – Whether the capacitor is charged or discharged
indicates storage of 1 or 0
– 1 capacitor
– 1 access transistor
• Capacitor leaks through the RC path – DRAM cell loses charge over time
– DRAM cell needs to be refreshed
row enable
_bitlin
e
• Static random access memory
• Two cross coupled inverters store a single bit
– Feedback path enables the stored value to persist in the “cell”
– 4 transistors for storage
– 2 transistors for access
Memory Technology: SRAM
row select
bitlin
e
_bitlin
e
Memory Hierarchy • Fundamental tradeoff
– Fast memory: small
– Large memory: slow
• Idea: Memory hierarchy
• Latency, cost, size,
bandwidth
CPU
Main
Memory
(DRAM)
RF
Cache
Hard Disk
Caching Basics: Exploit Temporal Locality
• Idea: Store recently accessed data in automatically managed fast memory (called cache)
• Anticipation: the data will be accessed again soon
• Temporal locality principle – Recently accessed data will be again accessed in the near future
– This is what Maurice Wilkes had in mind:
• Wilkes, “Slave Memories and Dynamic Storage Allocation,” IEEE Trans. On Electronic Computers, 1965.
• “The use is discussed of a fast core memory of, say 32000 words as a slave to a slower core memory of, say, one million words in such a way that in practical cases the effective access time is nearer that of the fast memory than that of the slow memory.”
Caching Basics: Exploit Spatial Locality • Idea: Store addresses adjacent to the recently accessed one in
automatically managed fast memory – Logically divide memory into equal size blocks
– Fetch to cache the accessed block in its entirety
• Anticipation: nearby data will be accessed soon
• Spatial locality principle – Nearby data in memory will be accessed in the near future
• E.g., sequential instruction access, array traversal
– This is what IBM 360/85 implemented
• 16 Kbyte cache with 64 byte blocks
• Liptay, “Structural aspects of the System/360 Model 85 II: the cache,” IBM Systems Journal, 1968.
The Bookshelf Analogy • Book in your hand
• Desk
• Bookshelf
• Boxes at home
• Boxes in storage
• Recently-used books tend to stay on desk – Comp Arch books, books for classes you are currently taking
– Until the desk gets full
• Adjacent books in the shelf needed around the same time – If I have organized/categorized my books well in the shelf
Input/Output Problems
• Wide variety of peripherals
– Delivering different amounts of data
– At different speeds
– In different formats
• All slower than CPU and RAM
• Need I/O modules
39 BIT20303-Computer Architecture
Input/Output Module
• Interface to CPU and Memory
• Interface to one or more peripherals
40 BIT20303-Computer Architecture
External Devices
• Human readable
– Screen, printer, keyboard
• Machine readable
– Monitoring and control
• Communication
– Modem
– Network Interface Card (NIC)
42 BIT20303-Computer Architecture
External Device Block Diagram Control Signal determines the function that the device will perform such as send data to the I/O module (INPUT or READ) or accept data from the I/O module (OUTPUT or WRITE). Status signal indicates the state of the device e.g. busy or idle. Data are according to the control signal either for READ or WRITE. Buffer is to temporarily hold the data being transferred between I/O and the external environment.
I/O Module Functions
• Control & Timing
• CPU Communication
• Device Communication
• Data Buffering
• Error Detection
44 BIT20303-Computer Architecture
Three Techniques for Input of a Block of Data
45 BIT20303-Computer Architecture
What are the differences between these techniques?
Programmed I/O
• CPU has direct control over I/O
– Sensing status
– Read/write commands
– Transferring data
• CPU waits for I/O module to complete operation
• Wastes CPU time
47 BIT20303-Computer Architecture
Programmed I/O - detail
• CPU requests I/O operation
• I/O module performs operation
• I/O module sets status bits
• CPU checks status bits periodically
• I/O module does not inform CPU directly
• I/O module does not interrupt CPU
• CPU may wait or come back later
48 BIT20303-Computer Architecture
Interrupt Driven I/O Basic Operation
• CPU issues read command
• I/O module gets data from peripheral whilst CPU does other work
• I/O module interrupts CPU
• CPU requests data
• I/O module transfers data
50 BIT20303-Computer Architecture
DMA
• Interrupt driven and programmed I/O require active CPU intervention
– Transfer rate is limited
– CPU is tied up
• DMA is the answer
53 BIT20303-Computer Architecture
DMA Operation
• CPU tells DMA controller:-
– Read/Write
– Device address
– Starting address of memory block for data
– Amount of data to be transferred
• CPU carries on with other work
• DMA controller deals with transfer
• DMA controller sends interrupt when finished
54 BIT20303-Computer Architecture
DMA Transfer Cycle Stealing
• DMA controller takes over bus for a cycle
• Transfer of one word of data
• Not an interrupt – CPU does not switch context
• CPU suspended just before it accesses bus – i.e. before an operand or data fetch or a data
write
• Slows down CPU but not as much as CPU doing transfer
55 BIT20303-Computer Architecture
Unsigned Integer
• 0101 + 0010
=(4+1) + 2 = 7
0101
+ 0010
0111
• 0101 1010 + 0001 0001
0101 1010
+ 0001 0001
0110 1011
Signed Integers (2’s Complement)
OVERFLOW RULE If 2 numbers are added, and they are both positive or both negative, then OVERFLOW occurs if and only if the result has the opposite sign.
Single-Precision Floating Point
FORMULA: Sign (1 bit).Exponent (3 bit).Significand (4 bit) ANSWER: 1.125x0.5=1.625 Note: Bias = 3, Thus exponent = -1 (where 010 is 2; thus 2 – 3 = -1), 1.001=1 + (1/8)=1 + 0.125
Single Precision Floating Point
• 0 010 0010 (8 bit) • Sign = 0 • Exponent = 010 – 7 = -5 • Significand = 0010 = 2-3 =
(1/8) = 0.25 • (-1)Sign x 1.significand x
2exponent-bias
= (-1)0 x 1.0010 x 2-5
= 1 x (1+0.25) x (1/32) = 1.25 x 0.03125 = 0.0390625
• 1 01111110 00100000000 000000000000 (24 bit)
• (-1)Sign x 1.significand x 2exponent-bias
= (-1)1 x 1.0010 x 2126-127
= -1 x (1+0.25) x 2-1
= -1.25 x 0.5 = -0.625
NOTE: For 8 bit, bias=3 (-3 to 4); for 24 bit, bias=127 (-127 to 128)
CPU Structure
• CPU must:
– Fetch instructions
– Interpret instructions
– Fetch data
– Process data
– Write data
65 BIT20303-Computer Architecture
Registers
• A small storage available in CPU
• Faster than main memory
69 BIT20303-Computer Architecture
Type of Registers
• General Purpose
• Data
• Address – hold addresses that are used by instructions to access main memory (RAM)
• Control and Status
70 BIT20303-Computer Architecture
How to increase speed performance of CPU?
• Improving organization – e.g. locate cache nearer to CPU, increase bus bandwidth
• Increase clock frequency – e.g. from 1 GHz to 5 GHz
• Increase parallelism e.g. pipelining, superscalar, Simultaneous Multithreading (SMT)
Recommended