34
10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 10 – Advanced Computer Architecture

10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Embed Size (px)

Citation preview

Page 1: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-1 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Computer Architecture and Organization

Miles Murdocca and Vincent Heuring

Chapter 10 – Advanced Computer Architecture

Page 2: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-2 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter Contents

10.1 Parallel Architecture

10.2 Superscalar Machines and the PowerPC

10.3 VLIW Machines, and the Itanium

10.4 Case Study: Extensions to the Instruction Set – The Intel MMX/SSEX and Motorola Altivec SIMD Instructions

10.5 Programmable Logic Devices and Custom ICs

10.6 Unconventional Architectures

Page 3: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-3 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Parallel Speedup and Amdahl’s Law• In the context of parallel processing, speedup

can be computed:

• Amdahl’s law, for p processors and a fraction f of unparallelizable code:

• For example, if f = 10% of the operations must be performed sequentially, then speedup can be no greater than 10 regardless of how many processors are used:

Page 4: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-4 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Efficiency and Throughput• Efficiency is the ratio of speedup to the number of processors used.

For a speedup of 5.3 with 10 processors, the efficiency is:

• Throughput is a measure of how much computation is achieved over time, and is of special concern for I/O bound and pipelined applications. For the case of a four stage pipeline that remains filled, in which each pipeline stage completes its task in 10 ns, the average time to complete an operation is 10 ns even though it takes 40 ns to execute any one operation. The overall throughput for this situation is then:

Page 5: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-5 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

FlynnTaxonomy

• Classification of architectures according to the Flynn taxonomy: (a) SISD; (b) SIMD; (c) MIMD; (d) MISD.

Page 6: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-6 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Network Topologies

• Network topologies: (a) crossbar; (b) bus; (c) ring; (d) mesh; (e) star; (f) tree; (g) perfect shuffle; (h) hypercube.

Page 7: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-7 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Crossbar• Internal organization of a crossbar.

Page 8: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-8 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Crosspoint Settings

• (a) Crosspoint settings for connections 0 3 and 3 0; (b) adjusted settings to accommodate connection 1 1.

Page 9: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-9 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Three-Stage Clos Network

Page 10: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-10 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

12-Channel Three-

Stage Clos Network

with n = p = 6

Page 11: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-11 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

12-Channel Three-Stage Clos

Network with n = p

= 2

Page 12: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-12 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

12-Channel Three-Stage Clos Network with n = p = 4

Page 13: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-13 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

12-Channel Three-Stage Clos Network with n = p = 3

Page 14: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-14 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

C function computes (x2 + y2) y2

Page 15: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-15 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Dependency Graph

• (a) Control sequence for C program; (b) dependency graph for C program.

Page 16: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-16 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Matrix Multiplication

• (a) Problem setup for Ax = b; (b) equations for computing the bi.

Page 17: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-17 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Matrix Multiplication Dependency Graph

Page 18: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-18 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

The PowerPC 601 Architecture

Page 19: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-19 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

128-Bit IA-64 Instruction Word

• Each 41 bit instruction consists of three register addresses (each 7 bits = 128 possible registers), a predicate register (6 bits) and the opcode and flags or general purpose register (14 bits, varies by instruction).

Page 20: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-20 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Itanium Instruction Types

Page 21: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-21 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Allowable Combinations

of IA-64 Instruction

Types Assigned to Instruction

Slots

Page 22: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-22 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

IA-64 Instruction Issues• Maximum number of IA-64 instructions that can be executed for

each pairing of bundles.

Page 23: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-23 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Intel MMX (MultiMedia eXtensions)

• Vector addition of eight bytes by the Intel PADDB mm0, mm1 instruction:

Page 24: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-24 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Intel and Motorola Vector Registers

• Intel “aliases” the floating point registers as MMX registers. This means that the Pentium’s 8 64-bit floating-point registers do double-duty as MMX registers.

• Motorola implements 32 128-bit vector registers as a new set, separate and distinct from the floating-point registers.

Page 25: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-25 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

MMX and AltiVec ArithmeticInstructions

Page 26: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-26 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Comparing Two MMX Byte Vectors for Equality

Page 27: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-27 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Conditional Assignment of an MMX Byte Vector

Page 28: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-28 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

A PAL Device

PLAs and PALs are similar except that the OR gates in a PAL have a fixed number of inputs and the inputs are not programmable. PALs are more prevalent than PLAs because they are easier to manufacture and are less complex.

Page 29: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-29 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Complex Programmable Logic DeviceCPLDs are PAL-like or PLA-like blocks that can be combined with programmable interconnections. Commercial CPLDs may contain as many as 200,000 equivalent gates and have over 3,000 macrocells.

Page 30: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-30 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Field Programmable Gate Array

Unlike CPLDs, which employ large logic blocks and fewer interconnection options, FPGAs employ small logic blocks that can be programmably interconnected.

Page 31: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-31 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Quantum Computing

Single-particle interference experiment.

Page 32: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-32 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Multi-Valued LogicTruth tables for binary and ternary comparison functions:

Page 33: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-33 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Neural NetworksModel of a living neuron, and model of an artificial neuron (below).

Page 34: 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10-34 Chapter 10 - Advanced Computer Architecture

Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Artificial Neural Network Example

Two simple, feed-forward neural networks with inputs, weights, and thresholds as shown.