High Peformance Computer

  • Upload
    ophank

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

  • 8/10/2019 High Peformance Computer

    1/43

    High Perfo rm an c e ParallelSupercompute r

    Dien Taufan LessyMCSCE

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Spring Semester 2014

  • 8/10/2019 High Peformance Computer

    2/43

    g p p

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Contents

    IntroductionParallelismFuture ResearchConclusion

    ReferencesLiteratures

  • 8/10/2019 High Peformance Computer

    3/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    I ntr oduction

    The first SupercomputerIBM Naval Ordnance Research Calculator

    15000 operations/s ADD(15 s), MUL(31 s), DIV(227 s)

    [1]

  • 8/10/2019 High Peformance Computer

    4/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    I ntr oduction

    The first SupercomputerControl Data Corporation 66001 MFLPOS

    [2]

  • 8/10/2019 High Peformance Computer

    5/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    I ntr oduction

    Today (November 2013)Tianhe-2 (MilkyWay-2)

    [3]

    Cores: 3.120.000

    Rmax: 33.862,7 (TFLOPs)Power: 17.808 kW

  • 8/10/2019 High Peformance Computer

    6/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    I ntr oduction

    Todays Ranking

    [3]

  • 8/10/2019 High Peformance Computer

    7/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    I ntr oduction

    HPC Vendor

    [3]

  • 8/10/2019 High Peformance Computer

    8/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    I ntr oduction

    Processor Generation

    [3]

  • 8/10/2019 High Peformance Computer

    9/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    I ntr oduction

    Segment user

    [3]

  • 8/10/2019 High Peformance Computer

    10/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    I ntr oduction

    OS

    [3]

  • 8/10/2019 High Peformance Computer

    11/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    I ntr oduction

    [3]

  • 8/10/2019 High Peformance Computer

    12/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    History

  • 8/10/2019 High Peformance Computer

    13/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    I ntr oduction

    [3]

    Concept and Terminology

  • 8/10/2019 High Peformance Computer

    14/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    The von Neumman Computer

    Walk-Through: c=a+b

    1. Get next instruction2. Decode: Fetch a3. Fetch a to internal register4. Get next instruction

    5. Decode: fetch b6. Fetch b to internal register7. Get next instruction8. Decode: add a and b (c in register)9. Do the addition in ALU10. Get next instruction11. Decode: store c in main memory12. Move c from internal register to main memory

    Note: Some units are idle while others are workingwaste of cycles. Pipelining (modularization) & Cashing (advance decoding)parallelism

  • 8/10/2019 High Peformance Computer

    15/43

  • 8/10/2019 High Peformance Computer

    16/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Increasing Cycle TimeMoores Law

  • 8/10/2019 High Peformance Computer

    17/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Increasing Cycle TimeCore Voltage increase with frequency

    [6]

  • 8/10/2019 High Peformance Computer

    18/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    ~

    ~

    ~

    Power cost of Frequency

  • 8/10/2019 High Peformance Computer

    19/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    High Performance Serial Processor needs

    high power

  • 8/10/2019 High Peformance Computer

    20/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Processor Memory GAP (Bootleneck)

  • 8/10/2019 High Peformance Computer

    21/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Definition

    Concurrent vs Parallel

    a parallel computer is collection of processingelements that communicate and cooperate to solvelarge problems quickly - Almasi and Gottlieb 1989

  • 8/10/2019 High Peformance Computer

    22/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Speedup vs Efficiency

    For given problem:

    speedup(using P Processors) =

    10 Processor with 2 times Speedup?

    exec. time (P Processor)

    exec. time (1 Processor)

  • 8/10/2019 High Peformance Computer

    23/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Serial vs Parallel

    [3]

  • 8/10/2019 High Peformance Computer

    24/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Serial vs Parallel

    [3]

  • 8/10/2019 High Peformance Computer

    25/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Serial vs Parallel

    [3]

  • 8/10/2019 High Peformance Computer

    26/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Serial vs Parallel

    [3]

  • 8/10/2019 High Peformance Computer

    27/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Processor Type

    Scalar processorCISC: Complex Instruction Set Computer

    Intel 80x86 (IA32)

    RISC: Reduced Instruction Set ComputerSun SPARC, IBM Power #, SGI MIPSVLIW: Very Long Instruction Word; Explicitly parallelinstruction computing (EPIC); Probably dying

    Intel IA64 (Itanium)

    Vector processor;Cray X1/T90; NEC SX#; Japan Earth Simulator; EarlyCray machines; Japan Life Simulator (hybrid)

  • 8/10/2019 High Peformance Computer

    28/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    CISC vs RISC vs VLWI

    [5]

  • 8/10/2019 High Peformance Computer

    29/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    [3]

    Flynns Classical Taxonomy

  • 8/10/2019 High Peformance Computer

    30/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    [3]

    SISD

  • 8/10/2019 High Peformance Computer

    31/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    [3]

    SIMD

  • 8/10/2019 High Peformance Computer

    32/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    [3]

    MISD

  • 8/10/2019 High Peformance Computer

    33/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    [3]

    MIMD

  • 8/10/2019 High Peformance Computer

    34/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Memory ArchitectureShared Memory

    Superscalar processors with L2cache connected to memorymodules through a bus or crossbar

    All processors have access to allmachine resources including

    memory and I/O devicesSMP (symmetric multiprocessor): ifprocessors are all the same andhave equal access to machineresources, i.e. it is symmetric.SMP are UMA (Uniform Memory

    Access) machinese.g., A node of IBM SP machine;SUN Ultraenterprise 10000

  • 8/10/2019 High Peformance Computer

    35/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Memory ArchitectureShared Memory

    If bus,Only one processor can access thememory at a time.Processors contend for bus toaccess memory

    If crossbar,Multiple processors can accessmemory through independent pathsContention when differentprocessors access same memorymoduleCrossbar can be very expensive.

    Processor count limited by memorycontention and bandwidth

    Max usually 64 or 128

  • 8/10/2019 High Peformance Computer

    36/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Memory ArchitectureDistributed Memory

    Superscalar processors withlocal memory connectedthrough communicationnetwork.Each processor can only workon data in local memory

    Access to remote memoryrequires explicitcommunication.Present-day largesupercomputers are all somesort of distributed-memorymachines

  • 8/10/2019 High Peformance Computer

    37/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    Memory ArchitectureHybrid Distributed-Shared Memory

    Overall distributedmemory, SMP nodesMost modern

    supercomputers andworkstation clusters areof this typeMessage passing; orhybrid messagepassing/threading.

  • 8/10/2019 High Peformance Computer

    38/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    [3]

    Amdahls Law

    Suppose only part of an application seems parallel Amdahl

    s law

    let s be the fraction of work done sequentially, so(1-s) is fraction parallelizable

    P = number of processors

    Speedup(P) = Time(1)/Time(P)

  • 8/10/2019 High Peformance Computer

    39/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Parallelism

    [3]

    Amdahls Law

  • 8/10/2019 High Peformance Computer

    40/43

  • 8/10/2019 High Peformance Computer

    41/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    Conclusion

    Improvement of Single-instruction-streamrequires a lot of effort for little gainParallel computing the only way to achievehigher performance in the foreseeablefutureSupercomputer combines all of parallelcomputing technology such as Paralle CPU,Multicore, Scalar, Vector, etc

  • 8/10/2019 High Peformance Computer

    42/43

    Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer

    References

    [1]: http://www.columbia.edu/cu/computinghistory/norc.html, May 2014[2]: http://en.wikipedia.org/wiki/File:CDC_6600.jc.jpg[3]: http://www.top500.org/, May 2014[4]: https://computing.llnl.gov/tutorials/parallel_comp/[5]: http://15418.courses.cs.cmu.edu/spring2014/lecture/whyparallelism[6]: http://www.intel.com

    [7]: http://discovermagazine.com/galleries/zen-photo/m/moores-law[8]: http://people.cs.clemson.edu/~mark/464/acmse_epic.pdf

  • 8/10/2019 High Peformance Computer

    43/43

    Dien Taufan Lessy (3011464)

    Literature

    Parallel Computer Architecture: A Hardware / Software

    Approach, D.E. Culler, J.P. Singh Computer Architecture: A Quantitative Approach, J.L.

    Hennessy, D.A. Patterson https://computing.llnl.gov/tutorials/parallel_comp/

    https://computing.llnl.gov/tutorials/parallel_comp/OverviewRecentSupercomputers.2008.pdf http://www-users.cs.umn.edu/~karypis/parbook/ http://www.top500.org/ http://15418.courses.cs.cmu.edu