30
Zellescher Weg 12 Willers-Bau A113 Tel. +49 351 - 463 - 39835 Matthias S. Mueller ([email protected] ) Holger Brunst ([email protected] ) Center for Information Services and High Performance Computing (ZIH) Leistungsanalyse von Rechnersystemen 10. Oktober 2007 Matthias Mueller, Holger Brunst: Leistungsanalyse Organization Lecture: Every Wednesday in INF E02 from 13:00 to 14:30 Exercise: Every Thursday from 11:10 to12:40 INF E09 for presentations INF E40 for labs First Exercise: October 18th, guided tour through all new machine rooms at ZIH All slides will be in English Ten minute summary of last lecture at the beginning of each lecture. Also given in English

Leistungsanalyse von Rechnersystemen...Linux Networx PC-Farm (Deimos) 26 water cooled racks (Knürr) 1296 AMD Opteron x85 Dual-Core CPUs (2,6 GHz) 728 compute nodes with 2 (384), 4

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Zellescher Weg 12

    Willers-Bau A113

    Tel. +49 351 - 463 - 39835

    Matthias S. Mueller ([email protected])

    Holger Brunst ([email protected])

    Center for Information Services and High Performance Computing (ZIH)

    Leistungsanalyse vonRechnersystemen

    10. Oktober 2007

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Organization

    Lecture: Every Wednesday in INF E02 from 13:00 to 14:30

    Exercise: Every Thursday from 11:10 to12:40

    – INF E09 for presentations

    – INF E40 for labs

    First Exercise: October 18th, guided tour through all new machine rooms atZIH

    All slides will be in English

    Ten minute summary of last lecture at the beginning of each lecture. Alsogiven in English

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Class Material on the Web

    Slides will be put on the web prior or shortly after each class

    ZIH web pages

    – http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/lehre/ws0708/lars

    Possibly: Bildungsportal Sachsen

    – http://bildungsportal.sachsen.de

    – Login required but identical to ZIH or INF login

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Lecture Script

    Not yet available :-(

    We would like to provide one

    Help needed

    Offer: Student helper

    – Attendance of both classes and exercises

    – LaTeX

    – Effort: 10 – 15 hours per week

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Class Outline

    14 lectures with 12 corresponding exercises

    Class structure

    – Introduction and motivation

    – Performance requirements and common evaluation mistakes

    – Performance metrics and evaluation techniques

    – Workload types, selection, and characterization

    – Commonly used benchmarks

    – Benchmarks specialized on I/O

    – Monitoring techniques

    – Capacity planning for future systems

    – Performance data presentation

    – Comparing system using sample data

    – Regression models

    – Experimental design

    – Performance simulation and prediction

    – Introduction to queuing theory

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Literature

    Raj Jain: The Art of Computer Systems Performance AnalysisJohn Wiley & Sons, Inc., 1991 (ISBN: 0-471-50336-3)

    Rainer Klar, Peter Dauphin, Fran Hartleb, Richard Hofmann, Bernd Mohr,Andreas Quick, Markus SiegleMessung und Modellierung paralleler und verteilter RechensystemeB.G. Teubner Verlag, Stuttgart, 1995 (ISBN:3-519-02144-7)

    Dongarra, Gentzsch, Eds.: Computer Benchmarks, Advances in ParallelComputing 8, North Holland, 1993 (ISBN: 0-444-81518-x)

  • Zellescher Weg 12

    Willers-Bau A113

    Tel. +49 351 - 463 - 39835

    Matthias S. Mueller ([email protected])

    Holger Brunst ([email protected])

    Motivation

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Innovations that changed our daily life

    steam engine, motor energy

    railway, car, airplanes transportation

    fertilizer food

    telephone communication

    computer data processing

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Speed of data processing

    Human 10-2 FLOPS

    Workstation, PC 108 FLOPS

    Supercomputer 1012 FLOPS

    Ratio: factor 1010 - 1014

    Matthias Mueller, Holger Brunst: Leistungsanalyse

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Matthias Mueller, Holger Brunst: Leistungsanalyse

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Matthias Mueller, Holger Brunst: Leistungsanalyse

  • HPC – A key technology?

    USA defines strategic mission of HPC

    – Software, methods and human beings

    – Main motivation from military applications

    – Integration of know-how in the country

    – Attraction of experts from all over the world

    Japan:

    – Creator of the Earth-Simulator

    – Petaflop special purpose machine for MD simulations

    – Petaflop project is in preparation

    EU started an European initiative

    Accelerated Strategic Computing Initiative (ASCI)

    Strategic Initiative in the U.S.A.

    ASCI Red (Sandia): Intel-System with 1TFLOP

    ASCI Blue (LANL;LLNL): IBM und SGI, 3 TFLOP each

    ASCI White (LLNL): IBM Power 3 (10 TFLOPS)

    ASCI Q (LANL): COMPAQ-Rechner (20 TFLOPS)

    Accelerated Strategic Computing Initiative-> Advanced Simulation and Computing (ASC)

    Red Storm (Sandia) Cray XT3, Opteron (40 TFLOPS)

    ASC Purple (LLNL): IBM Power 4 (100 TFLOPS)

    ASC BlueGene (LLNL): IBM PowerPC (180/360 TFLOPS)

  • What kind of know-how is required for HPC?

    Algorithms and methods

    Performance

    Programming(Paradigms and details of implementations)

    Operation of supercomputers(network, infrastructure, service, support)

    Challenges

    Languages

    – Fortran95, C++, Java

    Parallelization:

    – MPI, OpenMP

    Network

    – Ethernet, Infiniband, Myrinet

    Scheduling

    – Distributed components, job scheduling, process scheduling

    System architecture

    – Processors, memory hierarchy

    What is the best programming model for clustered SMPs with a deepmemory hierarchy?

  • Software – a key technology

    Software is a key factor for progress in our country

    Is Germany a location for software development?

    WWW is everywhere (E-Commerce, Google, EBay,…)

    Contribution of HPC:

    – Optimizing servers

    – Optimizing access to data bases

    – Optimizing applications

    – Technologies for Parallel Computing

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Does HPC and Performance matter in “real life” ?

    We just entered the Multi-Core Era:

    – IBM Power4, SUN T-1 (Niagara), Intel Core2 Duo, AMD Opteron DualCore

    – Embedded processors in telephones, ….

    More cores in the future:

    An Intel prediction: technology might support

    2010: 16–64 cores 200 GF–1 TF

    2013: 64–256 cores 500 GF–4 TF

    2016: 256–1024 cores 2 TF–20 TF

  • Zellescher Weg 12

    Willers-Bau A113

    Tel. +49 351 - 463 - 39835

    Matthias S. Mueller ([email protected])

    Holger Brunst ([email protected])

    Center of Information Services and HPC

    A short introduction

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    HPC in Germany

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Center for Information Services and HPC (ZIH)

    • Central Scientific Unit at TU Dresden

    • Merged institution: TUD Computing Center (URZ) andCenter for High Performance Computing (ZHR)

    • Competence Center for „Parallel Computing and Software Tools“

    • Strong commitment to support real users

    • Development of algorithms and methods: Cooperation with users from alldepartments

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Structure

    Unit IAK

    Interdiciplinary Application Development

    and Coordination

    Dr. M. Müller

    Unit NK

    Network and Communication

    W. Wünsch

    Unit ZSD

    Central Systems and Services

    Dr. S. Maletti

    Unit IMC

    Innovative Methods of Computing

    PD Dr. A. Deutsch

    Unit PSW

    Programming and Software Tools

    Dr. H. Mix

    Management

    Director: Prof. Dr. W. E. Nagel

    Deputy Directors: Dr. P. Fischer

    Dr. M. Müller

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Responsibilities of ZIH

    • Providing infrastructure and qualified service for TU Dresden and Saxony

    • Research topics

    Architecture and performance analysis of High Performance Computers

    Programming methods and techniques for HPC systems

    Software tools to support programming and optimization

    Modeling algorithms of biological processes

    Mathematical models, algorithms, and efficient implementations

    • Role of mediator between vendors, developers, and users

    • Pick up and preparation of new concepts, methods, and techniques

    • Teaching and Education

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Concept of Installation

    HPC-Component

    SGI® Altix® 4700

    2048 of MonteCito Cores

    6.5 TByte main memory

    PC-Farm

    System from Linux Networx

    AMD opteron CPUs (dual core, 2.6 GHz)

    728 boards with 2592 cores

    Infiniband networks between the nodes

    HPC-SAN

    Capacity:

    68 TB

    HPC-Server

    Main memory : 6 TB

    8 GB/s

    PC-SAN

    Capacity:

    50 TB

    4 GB/s

    -PC Farm

    4 GB/s

    PetaByte

    Tape Silo

    Capacity:

    1 PB

    1,8 GB/s

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    HPC-System: SGI Altix 4700

    32 x 42U Racks

    1024 x Sockets with Itanium2 Montecito Dual-

    Core CPUs (1.6 GHz/9MB L3 Cache)

    13 TFlop/s peak performance

    11.9 TFlop/s linpack

    6.5 TB shared memory

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Linux Networx PC-Farm (Deimos)26 water cooled racks (Knürr)

    1296 AMD Opteron x85 Dual-CoreCPUs (2,6 GHz)

    728 compute nodes with 2 (384), 4(232) or 8 (112) cores

    2 Master- und 11 Lustre-Server

    2 GB memory per core

    50 TB SAN disc

    Local scratch discs (70, 150, 290 GB)

    2 4x-Infiniband Fabrics (MPI + I/O)

    OS: SuSE SLES 10

    Batch system: LSF

    Compiler: Pathscale, PGI, Intel, Gnu

    ISV-Codes: Ansys100, CFX, Fluent,Gaussian, LS-DYNA, Matlab, MSC

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Storage Technology

    SGI InfiniteStorage 6700 (DDN S2A9500)

    4 GBit/s FC technology

    HPC-SAN:

    – 68 TB capacity

    – 8 GB/s performance

    PC-SAN:

    – 68 TB capacity

    – 4 GB/s performance

    DDN discs

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Petabyte Tape Archive

    2500 Slots

    30 LTO-3 tape drives

    2500 LTO-3 tapes

    1.8 GB/s performance

    SUN STK SL8500

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Realization of the concept

    HPC-Komponente

    Hauptspeicher 6,5 TBPC-Farm

    HPC-SAN

    Festplatten-

    kapazität:

    68 TB

    PC-SAN

    Festplatten-

    kapazität:

    68 TB

    PetaByte-Bandarchiv

    Kapazität:

    1 PB

    8 GB/s 4 GB/s4 GB/s

    1,8 GB/s

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Computer Rooms – Extension to the Building

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    HRSK-Building Status in March 2006

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Configuration of overall system: SANOverview

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Beschreibung der Lösung von SGIHPC-SAN

    • Gesamtkapazität: 68 TB

    • durchgängig 4 Gb/s FC

    • 4 x DDN S2A 9500 je 17 TB

    • 584 Festplatten 146 GB

    • CXFS/DMF auf Altix 350 (24 Itanium)

    • TP 9300 (MDS Storage Subsystem) 14 x 73

    GB für Metadaten

    • Zugang von PC-Farm:

    NFS-Server auf 12 x Altix 350 mit je 2 CPUs

    oder Opteron (für RDMA-Zugriff)

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    • mehr als 500 dual-core Itanium-2 (Montecito)

    • 1,6 GHz, 18 MB L3 (pro core 9 MB)

    • 12,8 GFlops Peak

    • 4…8 GB RAM (DDR2) S = 6.5 TB

    • verbunden über SGI NumaLink 4

    • Bandbreite: 3,2 GB/s pro Knoten und Richtung

    • Fat-Tree-Topologie

    • Grafik-Pipes + Grafik-Compositor

    • RASC Blade mit zwei FPGAs (RASClib)

    Beschreibung der Lösung von SGIHPC-Komponente

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Beschreibung der Lösung von SGIPC-Farm

    • mehr als 700 Boards

    • Prozessoren: AMD Opteron

    • Verbindungsnetzwerk: IB X4

    • Compute-Knoten verbunden über

    drei Switche (288 ports)

    • Anbindung an HPC-SAN über

    12 NFS-Server (CXFS-Clients)

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Beschreibung der Lösung von SGIPC-SAN

    • Lustre FS

    • 2 x DDN S2A 9500

    • Kapazität: 50,9 TB

    • 440 Festplatten 146 GB

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Tape Silo - Details

    • CXFS/DMF-Server on Altix 350

    (24 CPUs, 48 GB)

    • Data Migration Facility

    (Licence for 1 bzw. 2 PB)

    • 2 x FC-Switches (24 ports)

    • StorageTek SL 8500 (SUN)

    ACSLS-Lizenz for 2500 Slots

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Performance of Computers at ZIH

  • Zellescher Weg 12

    Willers-Bau A113

    Tel. +49 351 - 463 - 39835

    Matthias S. Mueller ([email protected])

    Holger Brunst ([email protected])

    Some Activities

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Vampir: Technical Components

    Worker 1

    Worker 2

    Worker m

    Master

    Server

    Trace 1

    Trace 2

    Trace 3

    Trace N

    Tools

    1. Trace generator

    2. Classical Vampir viewer andanalyzer

    3. Vampir client viewer

    4. Parallel server engine

    5. Conversion and analysis tools

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Vampir: Timeline

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Vampir: Scalability

    sPPM ASCI Benchmark

    3D Gas Dynamic

    Data to be analyzed

    16 Processes

    200 MByte Volume

    Number of Workers 1 2 4 8 16 32

    Load Time 47,33 22,48 10,80 5,43 3,01 3,16

    Timeline 0,10 0,09 0,06 0,08 0,09 0,09

    Summary Profile 1,59 0,87 0,47 0,30 0,28 0,25

    Process Profile 1,32 0,70 0,38 0,26 0,17 0,17

    Com. Matrix 0,06 0,07 0,08 0,09 0,09 0,09

    Stack Tree 2,57 1,39 0,70 0,44 0,25 0,25

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Vampir: A Large Test Case

    IRS ASCI Benchmark

    Implicit Radiation Solver

    Data to be analyzed:

    64 Processes in 8 Streams

    Approx. 800.000.000 Events

    40 GByte Data Volume

    Analysis Platform:

    Jump.fz-juelich.de

    41 IBM p690 nodes

    32 processors per node

    128 GByte per node

    Visualization Platform:

    Remote Laptop

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    BenchIT: Key Components 1

    BenchIT measurement core

    – Measurement kernels

    – Exact timer

    – Running kernels with variable problem sizes

    – Generating result files

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    BenchIT: Key Components 2

    BenchIT measurement core

    Command line interface

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    BenchIT: Key Components 3

    BenchIT measurement core

    Command line interface

    GUI

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    BenchIT: Key Components 4

    BenchIT measurement core

    Command line interface

    GUI

    Website

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Basisdaten-GUI mit eingelesenen rates.pbd

    bdm_rates_GUI.png

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Verhältnis stride und L2_Hit_Ratio

    bdm_rates_series-stride_x-stride_y1_runtime_y2-L2_Hit_Ratio.png

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Verhältnis stride und L3_Hit_Ratio

    /bdm_rates_series-stride_x-stride_y1_runtime_y2-L3_Hit_Ratio.png

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Rechenleistung und Speichertransfer aller seq. KVs

    bdm_rates_series-none_x-FLOPS_pT_y1_Mem_Transfer_pT_y2-none.png

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Multiprogramming Test_7_0

    bdm_rates_series-stride_x-stride_y1_FLOPS_pT_y2-none.png

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Multiprogramming Test_7_2

    bdm_rates_series-stride_x-stride_y1_FLOPS_pT_y2-none.png

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Multiprogramming Test_7_13

    bdm_rates_series-stride_x-stride_y1_FLOPS_pT_y2-none.png

  • Matthias Mueller, Holger Brunst: Leistungsanalyse

    Pt2Pt latency between all possible pairs

    0.005

    0.0055

    0.006

    0.0065

    0.007

    0.0075

    "64-2/result-all2all-latency.log" u 4:5:2

    0 10 20 30 40 50 60 70 0

    10

    20

    30

    40

    50

    60

    70

    0.005 0.0055

    0.006 0.0065

    0.007 0.0075

    Matthias Mueller, Holger Brunst: Leistungsanalyse

    Pt2Pt bandwidth between all possible pairs

    440

    460

    480

    500

    520

    540

    560

    580

    600

    "64/result-all2all-bandwidth.log" u 3:4:2

    0 10 20 30 40 50 60 70 0 10

    20 30

    40 50

    60 70

    440 460 480 500 520 540 560 580 600

  • Zellescher Weg 12

    Willers-Bau A113

    Tel. +49 351 - 463 - 39835

    Matthias S. Mueller ([email protected])

    Holger Brunst ([email protected])

    Thank you!

    Hope to see you next time…