View
0
Download
0
Category
Preview:
Citation preview
Zellescher Weg 12
Willers-Bau A113
Tel. +49 351 - 463 - 39835
Matthias S. Mueller (matthias.mueller@tu-dresden.de)
Holger Brunst (holger.brunst@tu-dresden.de)
Center for Information Services and High Performance Computing (ZIH)
Leistungsanalyse vonRechnersystemen
10. Oktober 2007
Matthias Mueller, Holger Brunst: Leistungsanalyse
Organization
Lecture: Every Wednesday in INF E02 from 13:00 to 14:30
Exercise: Every Thursday from 11:10 to12:40
– INF E09 for presentations
– INF E40 for labs
First Exercise: October 18th, guided tour through all new machine rooms atZIH
All slides will be in English
Ten minute summary of last lecture at the beginning of each lecture. Alsogiven in English
Matthias Mueller, Holger Brunst: Leistungsanalyse
Class Material on the Web
Slides will be put on the web prior or shortly after each class
ZIH web pages
– http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/lehre/ws0708/lars
Possibly: Bildungsportal Sachsen
– http://bildungsportal.sachsen.de
– Login required but identical to ZIH or INF login
Matthias Mueller, Holger Brunst: Leistungsanalyse
Lecture Script
Not yet available :-(
We would like to provide one
Help needed
Offer: Student helper
– Attendance of both classes and exercises
– LaTeX
– Effort: 10 – 15 hours per week
Matthias Mueller, Holger Brunst: Leistungsanalyse
Class Outline
14 lectures with 12 corresponding exercises
Class structure
– Introduction and motivation
– Performance requirements and common evaluation mistakes
– Performance metrics and evaluation techniques
– Workload types, selection, and characterization
– Commonly used benchmarks
– Benchmarks specialized on I/O
– Monitoring techniques
– Capacity planning for future systems
– Performance data presentation
– Comparing system using sample data
– Regression models
– Experimental design
– Performance simulation and prediction
– Introduction to queuing theory
Matthias Mueller, Holger Brunst: Leistungsanalyse
Literature
Raj Jain: The Art of Computer Systems Performance AnalysisJohn Wiley & Sons, Inc., 1991 (ISBN: 0-471-50336-3)
Rainer Klar, Peter Dauphin, Fran Hartleb, Richard Hofmann, Bernd Mohr,Andreas Quick, Markus SiegleMessung und Modellierung paralleler und verteilter RechensystemeB.G. Teubner Verlag, Stuttgart, 1995 (ISBN:3-519-02144-7)
Dongarra, Gentzsch, Eds.: Computer Benchmarks, Advances in ParallelComputing 8, North Holland, 1993 (ISBN: 0-444-81518-x)
Zellescher Weg 12
Willers-Bau A113
Tel. +49 351 - 463 - 39835
Matthias S. Mueller (matthias.mueller@tu-dresden.de)
Holger Brunst (holger.brunst@tu-dresden.de)
Motivation
Matthias Mueller, Holger Brunst: Leistungsanalyse
Innovations that changed our daily life
steam engine, motor energy
railway, car, airplanes transportation
fertilizer food
telephone communication
computer data processing
Matthias Mueller, Holger Brunst: Leistungsanalyse
Speed of data processing
Human 10-2 FLOPS
Workstation, PC 108 FLOPS
Supercomputer 1012 FLOPS
Ratio: factor 1010 - 1014
Matthias Mueller, Holger Brunst: Leistungsanalyse
Matthias Mueller, Holger Brunst: Leistungsanalyse
Matthias Mueller, Holger Brunst: Leistungsanalyse
Matthias Mueller, Holger Brunst: Leistungsanalyse
Matthias Mueller, Holger Brunst: Leistungsanalyse
HPC – A key technology?
USA defines strategic mission of HPC
– Software, methods and human beings
– Main motivation from military applications
– Integration of know-how in the country
– Attraction of experts from all over the world
Japan:
– Creator of the Earth-Simulator
– Petaflop special purpose machine for MD simulations
– Petaflop project is in preparation
EU started an European initiative
Accelerated Strategic Computing Initiative (ASCI)
Strategic Initiative in the U.S.A.
ASCI Red (Sandia): Intel-System with 1TFLOP
ASCI Blue (LANL;LLNL): IBM und SGI, 3 TFLOP each
ASCI White (LLNL): IBM Power 3 (10 TFLOPS)
ASCI Q (LANL): COMPAQ-Rechner (20 TFLOPS)
Accelerated Strategic Computing Initiative-> Advanced Simulation and Computing (ASC)
Red Storm (Sandia) Cray XT3, Opteron (40 TFLOPS)
ASC Purple (LLNL): IBM Power 4 (100 TFLOPS)
ASC BlueGene (LLNL): IBM PowerPC (180/360 TFLOPS)
What kind of know-how is required for HPC?
Algorithms and methods
Performance
Programming(Paradigms and details of implementations)
Operation of supercomputers(network, infrastructure, service, support)
Challenges
Languages
– Fortran95, C++, Java
Parallelization:
– MPI, OpenMP
Network
– Ethernet, Infiniband, Myrinet
Scheduling
– Distributed components, job scheduling, process scheduling
System architecture
– Processors, memory hierarchy
What is the best programming model for clustered SMPs with a deepmemory hierarchy?
Software – a key technology
Software is a key factor for progress in our country
Is Germany a location for software development?
WWW is everywhere (E-Commerce, Google, EBay,…)
Contribution of HPC:
– Optimizing servers
– Optimizing access to data bases
– Optimizing applications
– Technologies for Parallel Computing
Matthias Mueller, Holger Brunst: Leistungsanalyse
Does HPC and Performance matter in “real life” ?
We just entered the Multi-Core Era:
– IBM Power4, SUN T-1 (Niagara), Intel Core2 Duo, AMD Opteron DualCore
– Embedded processors in telephones, ….
More cores in the future:
An Intel prediction: technology might support
2010: 16–64 cores 200 GF–1 TF
2013: 64–256 cores 500 GF–4 TF
2016: 256–1024 cores 2 TF–20 TF
Zellescher Weg 12
Willers-Bau A113
Tel. +49 351 - 463 - 39835
Matthias S. Mueller (matthias.mueller@tu-dresden.de)
Holger Brunst (holger.brunst@tu-dresden.de)
Center of Information Services and HPC
A short introduction
Matthias Mueller, Holger Brunst: Leistungsanalyse
HPC in Germany
Matthias Mueller, Holger Brunst: Leistungsanalyse
Center for Information Services and HPC (ZIH)
• Central Scientific Unit at TU Dresden
• Merged institution: TUD Computing Center (URZ) andCenter for High Performance Computing (ZHR)
• Competence Center for „Parallel Computing and Software Tools“
• Strong commitment to support real users
• Development of algorithms and methods: Cooperation with users from alldepartments
Matthias Mueller, Holger Brunst: Leistungsanalyse
Structure
Unit IAK
Interdiciplinary Application Development
and Coordination
Dr. M. Müller
Unit NK
Network and Communication
W. Wünsch
Unit ZSD
Central Systems and Services
Dr. S. Maletti
Unit IMC
Innovative Methods of Computing
PD Dr. A. Deutsch
Unit PSW
Programming and Software Tools
Dr. H. Mix
Management
Director: Prof. Dr. W. E. Nagel
Deputy Directors: Dr. P. Fischer
Dr. M. Müller
Matthias Mueller, Holger Brunst: Leistungsanalyse
Responsibilities of ZIH
• Providing infrastructure and qualified service for TU Dresden and Saxony
• Research topics
Architecture and performance analysis of High Performance Computers
Programming methods and techniques for HPC systems
Software tools to support programming and optimization
Modeling algorithms of biological processes
Mathematical models, algorithms, and efficient implementations
• Role of mediator between vendors, developers, and users
• Pick up and preparation of new concepts, methods, and techniques
• Teaching and Education
Matthias Mueller, Holger Brunst: Leistungsanalyse
Concept of Installation
HPC-Component
SGI® Altix® 4700
2048 of MonteCito Cores
6.5 TByte main memory
PC-Farm
System from Linux Networx
AMD opteron CPUs (dual core, 2.6 GHz)
728 boards with 2592 cores
Infiniband networks between the nodes
HPC-SAN
Capacity:
68 TB
HPC-Server
Main memory : 6 TB
8 GB/s
PC-SAN
Capacity:
50 TB
4 GB/s
-PC Farm
4 GB/s
PetaByte
Tape Silo
Capacity:
1 PB
1,8 GB/s
Matthias Mueller, Holger Brunst: Leistungsanalyse
HPC-System: SGI Altix 4700
32 x 42U Racks
1024 x Sockets with Itanium2 Montecito Dual-
Core CPUs (1.6 GHz/9MB L3 Cache)
13 TFlop/s peak performance
11.9 TFlop/s linpack
6.5 TB shared memory
Matthias Mueller, Holger Brunst: Leistungsanalyse
Linux Networx PC-Farm (Deimos)26 water cooled racks (Knürr)
1296 AMD Opteron x85 Dual-CoreCPUs (2,6 GHz)
728 compute nodes with 2 (384), 4(232) or 8 (112) cores
2 Master- und 11 Lustre-Server
2 GB memory per core
50 TB SAN disc
Local scratch discs (70, 150, 290 GB)
2 4x-Infiniband Fabrics (MPI + I/O)
OS: SuSE SLES 10
Batch system: LSF
Compiler: Pathscale, PGI, Intel, Gnu
ISV-Codes: Ansys100, CFX, Fluent,Gaussian, LS-DYNA, Matlab, MSC
Matthias Mueller, Holger Brunst: Leistungsanalyse
Storage Technology
SGI InfiniteStorage 6700 (DDN S2A9500)
4 GBit/s FC technology
HPC-SAN:
– 68 TB capacity
– 8 GB/s performance
PC-SAN:
– 68 TB capacity
– 4 GB/s performance
DDN discs
Matthias Mueller, Holger Brunst: Leistungsanalyse
Petabyte Tape Archive
2500 Slots
30 LTO-3 tape drives
2500 LTO-3 tapes
1.8 GB/s performance
SUN STK SL8500
Matthias Mueller, Holger Brunst: Leistungsanalyse
Realization of the concept
HPC-Komponente
Hauptspeicher 6,5 TBPC-Farm
HPC-SAN
Festplatten-
kapazität:
68 TB
PC-SAN
Festplatten-
kapazität:
68 TB
PetaByte-Bandarchiv
Kapazität:
1 PB
8 GB/s 4 GB/s4 GB/s
1,8 GB/s
Matthias Mueller, Holger Brunst: Leistungsanalyse
Computer Rooms – Extension to the Building
Matthias Mueller, Holger Brunst: Leistungsanalyse
HRSK-Building Status in March 2006
Matthias Mueller, Holger Brunst: Leistungsanalyse
Configuration of overall system: SANOverview
Matthias Mueller, Holger Brunst: Leistungsanalyse
Beschreibung der Lösung von SGIHPC-SAN
• Gesamtkapazität: 68 TB
• durchgängig 4 Gb/s FC
• 4 x DDN S2A 9500 je 17 TB
• 584 Festplatten 146 GB
• CXFS/DMF auf Altix 350 (24 Itanium)
• TP 9300 (MDS Storage Subsystem) 14 x 73
GB für Metadaten
• Zugang von PC-Farm:
NFS-Server auf 12 x Altix 350 mit je 2 CPUs
oder Opteron (für RDMA-Zugriff)
Matthias Mueller, Holger Brunst: Leistungsanalyse
• mehr als 500 dual-core Itanium-2 (Montecito)
• 1,6 GHz, 18 MB L3 (pro core 9 MB)
• 12,8 GFlops Peak
• 4…8 GB RAM (DDR2) S = 6.5 TB
• verbunden über SGI NumaLink 4
• Bandbreite: 3,2 GB/s pro Knoten und Richtung
• Fat-Tree-Topologie
• Grafik-Pipes + Grafik-Compositor
• RASC Blade mit zwei FPGAs (RASClib)
Beschreibung der Lösung von SGIHPC-Komponente
Matthias Mueller, Holger Brunst: Leistungsanalyse
Beschreibung der Lösung von SGIPC-Farm
• mehr als 700 Boards
• Prozessoren: AMD Opteron
• Verbindungsnetzwerk: IB X4
• Compute-Knoten verbunden über
drei Switche (288 ports)
• Anbindung an HPC-SAN über
12 NFS-Server (CXFS-Clients)
Matthias Mueller, Holger Brunst: Leistungsanalyse
Beschreibung der Lösung von SGIPC-SAN
• Lustre FS
• 2 x DDN S2A 9500
• Kapazität: 50,9 TB
• 440 Festplatten 146 GB
Matthias Mueller, Holger Brunst: Leistungsanalyse
Tape Silo - Details
• CXFS/DMF-Server on Altix 350
(24 CPUs, 48 GB)
• Data Migration Facility
(Licence for 1 bzw. 2 PB)
• 2 x FC-Switches (24 ports)
• StorageTek SL 8500 (SUN)
ACSLS-Lizenz for 2500 Slots
Matthias Mueller, Holger Brunst: Leistungsanalyse
Performance of Computers at ZIH
Zellescher Weg 12
Willers-Bau A113
Tel. +49 351 - 463 - 39835
Matthias S. Mueller (matthias.mueller@tu-dresden.de)
Holger Brunst (holger.brunst@tu-dresden.de)
Some Activities
Matthias Mueller, Holger Brunst: Leistungsanalyse
Vampir: Technical Components
Worker 1
Worker 2
Worker m
Master
Server
Trace 1
Trace 2
Trace 3
Trace N
Tools
1. Trace generator
2. Classical Vampir viewer andanalyzer
3. Vampir client viewer
4. Parallel server engine
5. Conversion and analysis tools
Matthias Mueller, Holger Brunst: Leistungsanalyse
Vampir: Timeline
Matthias Mueller, Holger Brunst: Leistungsanalyse
Vampir: Scalability
sPPM ASCI Benchmark
3D Gas Dynamic
Data to be analyzed
16 Processes
200 MByte Volume
Number of Workers 1 2 4 8 16 32
Load Time 47,33 22,48 10,80 5,43 3,01 3,16
Timeline 0,10 0,09 0,06 0,08 0,09 0,09
Summary Profile 1,59 0,87 0,47 0,30 0,28 0,25
Process Profile 1,32 0,70 0,38 0,26 0,17 0,17
Com. Matrix 0,06 0,07 0,08 0,09 0,09 0,09
Stack Tree 2,57 1,39 0,70 0,44 0,25 0,25
Matthias Mueller, Holger Brunst: Leistungsanalyse
Vampir: A Large Test Case
IRS ASCI Benchmark
Implicit Radiation Solver
Data to be analyzed:
64 Processes in 8 Streams
Approx. 800.000.000 Events
40 GByte Data Volume
Analysis Platform:
Jump.fz-juelich.de
41 IBM p690 nodes
32 processors per node
128 GByte per node
Visualization Platform:
Remote Laptop
Matthias Mueller, Holger Brunst: Leistungsanalyse
BenchIT: Key Components 1
BenchIT measurement core
– Measurement kernels
– Exact timer
– Running kernels with variable problem sizes
– Generating result files
Matthias Mueller, Holger Brunst: Leistungsanalyse
BenchIT: Key Components 2
BenchIT measurement core
Command line interface
Matthias Mueller, Holger Brunst: Leistungsanalyse
BenchIT: Key Components 3
BenchIT measurement core
Command line interface
GUI
Matthias Mueller, Holger Brunst: Leistungsanalyse
BenchIT: Key Components 4
BenchIT measurement core
Command line interface
GUI
Website
Matthias Mueller, Holger Brunst: Leistungsanalyse
Basisdaten-GUI mit eingelesenen rates.pbd
bdm_rates_GUI.png
Matthias Mueller, Holger Brunst: Leistungsanalyse
Verhältnis stride und L2_Hit_Ratio
bdm_rates_series-stride_x-stride_y1_runtime_y2-L2_Hit_Ratio.png
Matthias Mueller, Holger Brunst: Leistungsanalyse
Verhältnis stride und L3_Hit_Ratio
/bdm_rates_series-stride_x-stride_y1_runtime_y2-L3_Hit_Ratio.png
Matthias Mueller, Holger Brunst: Leistungsanalyse
Rechenleistung und Speichertransfer aller seq. KVs
bdm_rates_series-none_x-FLOPS_pT_y1_Mem_Transfer_pT_y2-none.png
Matthias Mueller, Holger Brunst: Leistungsanalyse
Multiprogramming Test_7_0
bdm_rates_series-stride_x-stride_y1_FLOPS_pT_y2-none.png
Matthias Mueller, Holger Brunst: Leistungsanalyse
Multiprogramming Test_7_2
bdm_rates_series-stride_x-stride_y1_FLOPS_pT_y2-none.png
Matthias Mueller, Holger Brunst: Leistungsanalyse
Multiprogramming Test_7_13
bdm_rates_series-stride_x-stride_y1_FLOPS_pT_y2-none.png
Matthias Mueller, Holger Brunst: Leistungsanalyse
Pt2Pt latency between all possible pairs
0.005
0.0055
0.006
0.0065
0.007
0.0075
"64-2/result-all2all-latency.log" u 4:5:2
0 10 20 30 40 50 60 70 0
10
20
30
40
50
60
70
0.005 0.0055
0.006 0.0065
0.007 0.0075
Matthias Mueller, Holger Brunst: Leistungsanalyse
Pt2Pt bandwidth between all possible pairs
440
460
480
500
520
540
560
580
600
"64/result-all2all-bandwidth.log" u 3:4:2
0 10 20 30 40 50 60 70 0 10
20 30
40 50
60 70
440 460 480 500 520 540 560 580 600
Zellescher Weg 12
Willers-Bau A113
Tel. +49 351 - 463 - 39835
Matthias S. Mueller (matthias.mueller@tu-dresden.de)
Holger Brunst (holger.brunst@tu-dresden.de)
Thank you!
Hope to see you next time…
Recommended