Upload
chester-black
View
232
Download
4
Embed Size (px)
Citation preview
10-1 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Computer Architecture and Organization
Miles Murdocca and Vincent Heuring
Chapter 10 – Advanced Computer Architecture
10-2 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Chapter Contents
10.1 Parallel Architecture
10.2 Superscalar Machines and the PowerPC
10.3 VLIW Machines, and the Itanium
10.4 Case Study: Extensions to the Instruction Set – The Intel MMX/SSEX and Motorola Altivec SIMD Instructions
10.5 Programmable Logic Devices and Custom ICs
10.6 Unconventional Architectures
10-3 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Parallel Speedup and Amdahl’s Law• In the context of parallel processing, speedup
can be computed:
• Amdahl’s law, for p processors and a fraction f of unparallelizable code:
• For example, if f = 10% of the operations must be performed sequentially, then speedup can be no greater than 10 regardless of how many processors are used:
10-4 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Efficiency and Throughput• Efficiency is the ratio of speedup to the number of processors used.
For a speedup of 5.3 with 10 processors, the efficiency is:
• Throughput is a measure of how much computation is achieved over time, and is of special concern for I/O bound and pipelined applications. For the case of a four stage pipeline that remains filled, in which each pipeline stage completes its task in 10 ns, the average time to complete an operation is 10 ns even though it takes 40 ns to execute any one operation. The overall throughput for this situation is then:
10-5 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
FlynnTaxonomy
• Classification of architectures according to the Flynn taxonomy: (a) SISD; (b) SIMD; (c) MIMD; (d) MISD.
10-6 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Network Topologies
• Network topologies: (a) crossbar; (b) bus; (c) ring; (d) mesh; (e) star; (f) tree; (g) perfect shuffle; (h) hypercube.
10-7 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Crossbar• Internal organization of a crossbar.
10-8 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Crosspoint Settings
• (a) Crosspoint settings for connections 0 3 and 3 0; (b) adjusted settings to accommodate connection 1 1.
10-9 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Three-Stage Clos Network
10-10 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
12-Channel Three-
Stage Clos Network
with n = p = 6
10-11 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
12-Channel Three-Stage Clos
Network with n = p
= 2
10-12 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
12-Channel Three-Stage Clos Network with n = p = 4
10-13 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
12-Channel Three-Stage Clos Network with n = p = 3
10-14 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
C function computes (x2 + y2) y2
10-15 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Dependency Graph
• (a) Control sequence for C program; (b) dependency graph for C program.
10-16 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Matrix Multiplication
• (a) Problem setup for Ax = b; (b) equations for computing the bi.
10-17 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Matrix Multiplication Dependency Graph
10-18 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
The PowerPC 601 Architecture
10-19 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
128-Bit IA-64 Instruction Word
• Each 41 bit instruction consists of three register addresses (each 7 bits = 128 possible registers), a predicate register (6 bits) and the opcode and flags or general purpose register (14 bits, varies by instruction).
10-20 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Itanium Instruction Types
10-21 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Allowable Combinations
of IA-64 Instruction
Types Assigned to Instruction
Slots
10-22 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
IA-64 Instruction Issues• Maximum number of IA-64 instructions that can be executed for
each pairing of bundles.
10-23 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Intel MMX (MultiMedia eXtensions)
• Vector addition of eight bytes by the Intel PADDB mm0, mm1 instruction:
10-24 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Intel and Motorola Vector Registers
• Intel “aliases” the floating point registers as MMX registers. This means that the Pentium’s 8 64-bit floating-point registers do double-duty as MMX registers.
• Motorola implements 32 128-bit vector registers as a new set, separate and distinct from the floating-point registers.
10-25 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
MMX and AltiVec ArithmeticInstructions
10-26 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Comparing Two MMX Byte Vectors for Equality
10-27 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Conditional Assignment of an MMX Byte Vector
10-28 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
A PAL Device
PLAs and PALs are similar except that the OR gates in a PAL have a fixed number of inputs and the inputs are not programmable. PALs are more prevalent than PLAs because they are easier to manufacture and are less complex.
10-29 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Complex Programmable Logic DeviceCPLDs are PAL-like or PLA-like blocks that can be combined with programmable interconnections. Commercial CPLDs may contain as many as 200,000 equivalent gates and have over 3,000 macrocells.
10-30 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Field Programmable Gate Array
Unlike CPLDs, which employ large logic blocks and fewer interconnection options, FPGAs employ small logic blocks that can be programmably interconnected.
10-31 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Quantum Computing
Single-particle interference experiment.
10-32 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Multi-Valued LogicTruth tables for binary and ternary comparison functions:
10-33 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Neural NetworksModel of a living neuron, and model of an artificial neuron (below).
10-34 Chapter 10 - Advanced Computer Architecture
Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring
Artificial Neural Network Example
Two simple, feed-forward neural networks with inputs, weights, and thresholds as shown.