Upload
sumit-mittu
View
713
Download
11
Embed Size (px)
DESCRIPTION
Multivector and SIMD Computers
Citation preview
CSE539: Advanced Computer Architecture
Chapter 8
Multivector and SIMD Computers Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani
Sumit Mittu
Assistant Professor, CSE/IT
Lovely Professional University
In this chapter…
• Vector Processing Principles
• Compound Vector Operations
• Vector Loops and Chaining
• SIMD Computer Implementation Models
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 2
VECTOR PROCESSING PRINCIPLES
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 3
• Vector Processing Definitions o Vector
o Stride
o Vector Processor
o Vector Processing
o Vectorization
o Vectorizing Compiler or Vectorizer
• Vector Instruction Types o Vector-vector instructions
o Vector-scalar instructions
o Vector-memory instructions
VECTOR PROCESSING PRINCIPLES
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 4
VECTOR PROCESSING PRINCIPLES
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 5
• Vector-Vector Instructions o F1: Vi Vj
o F2: Vi x Vj Vk
o Examples: V1 = sin(V2) V3 = V1+ V2
• Vector-Scalar Instructions o F3: s x Vi Vj
o Examples: V2 = 6 + V1
• Vector-Memory Instructions o F4: M V (Vector Load)
o F5: V M (Vector Store)
o Examples: X = V1 V2 = Y
VECTOR PROCESSING PRINCIPLES
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 6
• Vector Reduction Instructions o F6: Vi s
o F7: Vi x Vj s
• Gather and Scatter Instructions o F8: M Vi x Vj (Gather)
o F9: Vi x Vj M (Scatter)
• Masking o F10: Vi x Vm Vj (Vm is a binary vector)
• Examples…
VECTOR PROCESSING PRINCIPLES
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 7
VECTOR PROCESSING PRINCIPLES
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 8
• Vector-Access Memory Schemes o Vector-operand Specifications
• Base address, stride and length
o C-Access Memory Organization
• Low-order m-way interleaved memory
o S-access Memory Organizations
• High-order m-way interleaved memory
o C/S Access Memory Organization
• Early Supercomputers (Vectors Processors) o Cray Series ETA 10E NEC Sx-X 44
o CDC Cyber Fujitsu VP2600 Hitachi 820/80
VECTOR PROCESSING PRINCIPLES
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 9
• Relative Vector/Scalar Performance o Vector/scalar speed ratio r
o Vectorization ratio in program f
o Relative Performance P is given by:
• 𝑷 = 𝟏
𝟏−𝒇 + 𝒇/𝒓=
𝒓
𝟏−𝒇 𝒓 + 𝒇
o When f is low, the speedup cannot be high even with very high r
o Limiting Case:
• P 1 if f 0
o Maximum Case:
• P r if f 1
o Powerful single chip processors and multicore system-on-a-chip provide High-Performance Computing (HPC) using MIMD and/or SPMD configurations with large no. of processors.
COMPUOUND VECTOR PROCESSING
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 10
• Compound Vector Operations o Compound Vector Functions (CVFs)
• Composite function of vector operations converted from a looping structure of linked scalar
operations
o CVF Example: The SAXPY (Single-precision A multiply X Plus Y) Code
• For I = 1 to N
o Load R1, X(I)
o Load R2, Y(I)
o Multiply R1, A
o Add R2, R1
o Store Y(I), R2
• (End of Loop)
COMPUOUND VECTOR PROCESSING
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 11
• One-dimensional CVF Examples o V(I) = V2(I) + V(3) x V(4)
o V1(I) = B(I) + C(I)
o A(I) = V(I) x S + B(I)
o A(I) = V(I) + B(I) + C(I)
o A(I) = Q x v1(I) (R x B(I) + C(I)), etc.
Legend:
o Vi(I) are vector registers
o A(I), B(I), C(I) are vectors in memory
o Q, S are scalars available from scalar registers in memory
COMPUOUND VECTOR PROCESSING
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 12
• Vector Loops o Vector segmentation or strip-mining approach
o Example
• Vector Chaining o Example: SAXPY code
• Limited Chaining using only one memory-access pipe in Cray-I
• Complete Chaining using three memory-access pipes in Cray X-MP
• Functional Unit Independence
• Vector Recurrence
COMPUOUND VECTOR PROCESSING
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 13
COMPUOUND VECTOR PROCESSING
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 14
SIMD COMPUTER ORGANIZATIONS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 15
• SIMD Computer Variants o Array Processor
o Associative Processor
• SIMD Processor v/s SISD v/s Vector Processor Operation o Illustration: for(i=0;i<5;i++) a[i] = a[i]+2;
o Lockstep mode of operation in SIMD processor
o Relative Performance comparison
• SIMD Implementation Models o Distributed Memory Model
• E.g. Illiac IV
o Shared memory Model
• E.g. BSP (Burroughs Scientific Processor)
SIMD COMPUTER ORGANIZATIONS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 16
SIMD COMPUTER ORGANIZATIONS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 17
SIMD COMPUTER ORGANIZATIONS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 18
• SIMD Instructions o Scalar Operations
• Arithmetic/Logical
o Vector Operations
• Arithmetic/Logical
o Data Routing Operations
• Permutations, broadcasts, multicasts, rotation and shifting
o Masking Operations
• Enable/Disable PEs
• Host and I/O
• Bit-slice and Word-slice Processing o WSBS, WSBP, WPBS, WPBP