Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
CPE411Parallel and Distributed Computing
Week 1Introduction
Pruet Boonma
Department of Computer Engineering
Faculty of Engineering, Chiang Mai University
2
In this class
• Parallel Computing• Architecture, Paradigm and Issues
• Shared memory vs. message passing
• Operating systems and middleware
• Algorithm model and complexity
• Distributed Computing• Architecture, Paradigm and Issues
• Tier vs. peer-to-peer architectures
• Operating systems and middleware
• Algorithm model and complexity
• Hardware support
• Advanced topics
3
What I want from you
• Your original work.• Plagiarism is zero tolerance in my class.
• That means, if you copy your friend’s work, you got an F.
• It’s ok to submit an incomplete or buggy work.• We can discuss how to make it better.
• Your attention.• No attending score, so it’s ok to not attend my class.
• But you’ve to submit your work.
• If you’re in a class, please respect me and your friend.• No cell-phone ringing, no snoring (sleeping quietly is ok), no
smelly food (food/drink is ok in lecture room).
5
Grading
● Homework 30%
● Report/Presentation 40%
● Midterm 10%
● Final 20%
● A – [85%, 100%]
● B+ – [80%, 85%)
● B – [75%, 80%)
● C+ – [70%, 75%)
● C – [65%, 70%)
● You don't want anything below C.......... believe me
6
Introduction
Let's start with some terminology
Parallel vs. Distributed vs. ConcurrentComputing
7
Introduction
Parallel computing is a computational approach where a problem is decomposed
into small problem and solved simultaneously using multiple processors.
Think of it as multiple workers try to lay bricks on a same house.
8
Introduction
Distributed computing is a computational approach where a collection of multiple
autonomous computers that communicate through a computer network tries to solve a
problem together.
Think of it as soccer players try to play soccer together as a team.
9
Introduction
Concurrent computing is a computational approach where a programs are designed as
collections of interacting computational process that may be executed in parallel.
Think of it as a student tries to listen to a lecture and plays facebook in the same time.
10
Introduction
Then, what are the differences?
11
Introduction
Parallel computing is a computational approach where a problem is decomposed into small problem and solved
simultaneously using multiple processors.
Distributed computing is a computational approach where a collection of multiple autonomous computers that
communicate through a computer network tries to solve a problem together.
Concurrent computing is a computational approach where a programs are designed as collections of interacting
computational process that may be executed in parallel.
12
Introduction
Parallel computing is a computational approach where a problem is decomposed into small problem and solved
simultaneously using multiple processors.
Distributed computing is a computational approach where a collection of multiple autonomous computers that
communicate through a computer network tries to solve a problem together.
Concurrent computing is a computational approach where a programs are designed as collections of interacting
computational process that may be executed in parallel.
13
Parallel Architectures
There are many ways to classify parallel architectures.
One of the most frequently used is Flynn's taxonomy of computer architecture which
classifies parallelism based on instruction and data flow.
What's the instruction and data flows?
14
Parallel Architectures
This is a simple Von Neumann architecture
Instruction Data
Memory(program/data)
Controlunit
Controlunit
Output
Input
15
Parallel Architectures
Flynn's taxonomySISD, SIMD, MISD, MIMD
S = singleM = Multiple
I = InstructionD = Data
16
SISD
Single instruction, single data stream.AKA synchronous architectures or sequential
computer.Data Pool
Instruction Pool
17
SIMD
Single instruction, multiple data stream.For example, GPU.
Data Pool
Instruction Pool
18
MISD
Multiple instruction, single data stream.Fault tolerance system, e.g., space shuttle
computer.Data Pool
Instruction Pool
19
MIMD
Multiple instruction, multiple data stream.E.g., distributed systems.
Data Pool
Instruction Pool
20
Parallel Architecture
The other way to classify is by looking at the level of parallelism.
From fine-grained to coarse-grained.Bit-level parallelism
Instruction-level parallelismData parallelismTask parallelism.
21
Bit-level parallelism
Well, most of computers now have bit-level parallelism
For example, a 8-bit CPU can process 8 bit data simultaneously.
Increase the number of bits (per word) can speed up the computation.
But, what's the limitations and trade-off?
22
Instruction-level parallelism
Modern computers can execute multiple instructions simultaneously using pipeline.
For example, Pentium 4 has 35 stages pipeline, so it can execute 35 instructions at a
time.
Intel Core architecture reduces the length of the pipeline to 14 stages. The penalty cost is
too high.
23
Data Parallelism
Data parallelism is performed in application code level, especially in loop.
If inside a loop, an instruction is performed on different data, then, the instruction can be performed concurrently on different data.
It's a kind of SIMD.
24
Task Parallelism
Think of it as a distributed computing.
25
Current Trends in Parallel Architecture
Multi-core processor
Massively Parallel Processing (MPP)
General-Curpose Computing on Graphic Processing Units (GPGPU)
Vector Processing
26
Multi-core ProcessorMultiple processor units (PUs) in a single
package/die.
They share main memory (RAM) but can have separate cache memory.
Every core can share the same characteristic or different. E.g, Intel multi-core processors vs. PS3
Cell processors.
Multi-core processors can have simultaneous multithreading (i.e., HyperThread) to increase
parallelism.
27
MPP
MPP is a computer with many networked processors.
Many == 100++ processors.
Each processor has its own memory and connect to the other through a high-speed interconnect
networks (i.e., 100Gbps++) and has a copy of OS+application.
Example of this system is IBM's Deep Blue (30*120MHz RISC CPU + 480*Chess Chip).
28
GPGPU
Computer graphics processing (3D rendering, texturing, shading) is suitable for data parallel
operations, by nature.
GPUs are heavily optimized for those kinds of task.
So, GPGPU utilizes GPU to perform non-graphic parallel operations.
Example of this technology: CUDA, OpenCL
29
Vector Processing
Well, it's SIMD.
For example, you can perform A = BxC, where A, B and C are vectors with only one
instruction.
Instead of 2|B| instruction.
Examples: Cray-1 supercomputer , Intel Streaming SIMD extensions (SSE)
30
Quad-Core Processor
31
Cell Processor
32
Deep Blue/BlueGene
33
Cray-1
34
What's next?
Paradigm and issue.