Parallel Algor

Embed Size (px)

Citation preview

  • 8/4/2019 Parallel Algor

    1/27

    Multiple Processor Organization (Flyns

    Classification)

    Single instruction, single data stream - SISD

    Single instruction, multiple data stream - SIMD

    Multiple instruction, single data stream - MISD

    Multiple instruction, multiple data stream- MIMD

  • 8/4/2019 Parallel Algor

    2/27

    Single Instruction, Single Data Stream - SISD

    Single processor

    Single instruction stream

    Data stored in single memory

    Uni-processor

  • 8/4/2019 Parallel Algor

    3/27

    Parallel Organizations - SISD

    Si i

  • 8/4/2019 Parallel Algor

    4/27

    Single Instruction,

    Multiple Data Stream - SIMD

    Each of the N processors possesses its own local

    memory where it can store both programs and data.

    All processors operate under the control of a single

    instruction stream issued by a central control unit.

    Each instruction executed on different set of data bydifferent processors

    in order to exchange data or intermediate results two

    techniques are there: SIMD computers where

    communication is through a shared memory and those

    where it is done via an interconnection network.

  • 8/4/2019 Parallel Algor

    5/27

    Parallel Organizations - SIMD

  • 8/4/2019 Parallel Algor

    6/27

    Shared-Memory (SM) SIMD Computers.

    N processors share a common memory

    When two processors wish to communicate, they do

    so through the shared memory.

    I t ti N t k SIMID C t

  • 8/4/2019 Parallel Algor

    7/27

    Interconnection-Network SIMID Computers.

    M locations of the shared memory are distributed

    among the N processors, each receiving M/Nlocations.

    M lti l I t ti

  • 8/4/2019 Parallel Algor

    8/27

    Multiple Instruction,

    Single Data Stream - MISD

    N processors each with its own control unit share a

    common memory unit

    At each step, one datum received from memory is

    operated upon by all the processors simultaneously,

    each according to the instruction it receives from itscontrol.

    M lti l I t ti

  • 8/4/2019 Parallel Algor

    9/27

    Multiple Instruction,

    Multiple Data Stream- MIMD

    N processors, N streams of instructions, and N

    streams of data each possesses has its own control unit in addition to

    its local memory and arithmetic and logic unit.

    Communication between processors is performedthrough a shared memory or an interconnection

    network.

    MIMD computers sharing a common memory are

    often referred to as multiprocessors (or tightlycoupled machines) while those with an

    interconnection network are known as

    multicomputers (or loosely coupled machines).

  • 8/4/2019 Parallel Algor

    10/27

    Parallel Organizations - MIMD Shared

    Memory

    P ll l O i ti MIMD

  • 8/4/2019 Parallel Algor

    11/27

    Parallel Organizations - MIMD

    Distributed Memory

  • 8/4/2019 Parallel Algor

    12/27

    MIMD - contd

    If all the processors are in close proximity of one

    another (they are all in the same room, say), then theyare a multicomputer;

    otherwise (they are in different cities, say) they are a

    distributed system.

  • 8/4/2019 Parallel Algor

    13/27

    Merging

    It is defined as follows: LetA = {a1 ,a2 ,. .. ,ar} and

    B = {bl,b 2 ,... ,bs} be two sequences of numberssorted in nondecreasing order; it is required to merge

    A and B, that is, to form a third sequence

    C = {c1, C2 , . . ., Cr+s} also sorted in nondecreasingorder,

  • 8/4/2019 Parallel Algor

    14/27

    Sequential Merging

  • 8/4/2019 Parallel Algor

    15/27

    Parallel Merging network

    collection of very simple processors communicating

    through a special-purpose network ((r, s)-mergingnetwork.

    A processor receives two inputs and produces two

    outputs. By comparing the value place the smaller and larger

    of the two on its top and bottom line respectively.

    Using these processor, we proceed to build a network

    that takes as input the two sorted sequences A = {al, a 2 ,

    ... .,ar}andB = {bj, b2,. . , bs} and produces as output a

    single sorted sequence C = {c1, c2 , . . ., cr+s}.

  • 8/4/2019 Parallel Algor

    16/27

    Parallel Merging network : contd

    Use the following assumptions:

    1. the two input sequences are of the same size, that is, r

    = s = n > 1, and

    2. n is a power of 2.

    When n = 1, a single processor clearly suffices:

  • 8/4/2019 Parallel Algor

    17/27

    Parallel Merging network : contd

    When n =2, the two sequences A = {a1, a2} and B =

    {b,, b2} are correctly merged by the network as :

    dds

  • 8/4/2019 Parallel Algor

    18/27

    Parallel Merging network : contd

    First, the odd-numbered elements of A andB, that is

    {al, a3 , a5 . an-1 } and {b1, b3, b5 , .... Bn-1) aremerged using an (n/2, n/2)-merging network to

    produce a sequence {d1, d 2,d 3 ,...,dn}.

    elements of the two sequences, {a2 , a4 , a 6 , - , an}and {b2, b4, b6bn} are also merged using an (n/2,

    n/2)-merging network to produce a sequence {el, e2,

    e3 , .. , en}.

    The final sequence {c1 ,c 2 , .... c2n } is nowobtained from : c = d1, c2n = en, c2i = min(di+1,

    ei), and c2i+ 1 = max(di+1, ej), for i = 1,2,..., n - 1.

  • 8/4/2019 Parallel Algor

    19/27

    Parallel Merging network : contd

    Fig: Odd-Even Merging network

  • 8/4/2019 Parallel Algor

    20/27

    Parallel Merging network : Analysis

    Running Time: assuming that a processor can read its

    input, perform a comparison, and produce its outputall in one time unit.

    t(2n) denote the time required by an (n, n)-merging

    network to merge two sequences of length n each. t(2) = 1 for n = 1

    t(2n) = t(n) + 1 for n > 1

    t(2n) = 1 + log n.

  • 8/4/2019 Parallel Algor

    21/27

    Parallel Merging network : Analysis

    Number of Processor:

    Let p(2n) denote the number of processor in an (n, n)-

    merging network. Again, we have a recurrence:

    p(2) = 1 for n = 1

    p(2n) = 2p(n) + (n - 1) for n > 1 whose solutionp(2n) = 1 + n log n

    Cost:

    c(2n) =p(2n) * t(2n) = O(n log2n).

  • 8/4/2019 Parallel Algor

    22/27

    MERGING ON THE CREW MODEL: contd

    A CREW SM SIMD computer consists ofN processors

    P1, P2 , ... PN.

    Parallel algorithm for this computer takes the two

    sequences A and B as input and produces the sequence

    C as output, as defined earlier. Assuming r

  • 8/4/2019 Parallel Algor

    23/27

    MERGING ON THE CREW MODEL : contd

  • 8/4/2019 Parallel Algor

    24/27

    MERGING ON THE CREW MODEL : contd

    Parallel merging algorithm for a shared memory

    Computer is presented as procedure CREW MERGE.

    procedure CREW MERGE (A, B, C)

    Step 1:

  • 8/4/2019 Parallel Algor

    25/27

    MERGING ON THE CREW MODEL : contd

    Step 2:

  • 8/4/2019 Parallel Algor

    26/27

    MERGING ON THE CREW MODEL : contd

    Step 3:

  • 8/4/2019 Parallel Algor

    27/27