26
Innovation Intelligence ® Accelerating Complex Simulations: An Example From Manufacturing Eric Lequiniou Director, High Performance Computing Altair [email protected]

Accelerating Complex Simulations: An Example From … · 2014. 11. 10. · HyperWorks Solvers Thermal and CFD Highly Non-Linear Crash Safety Forming Statics NVH Thermal Non-Linear

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

  • Innovation Intelligence®

    Accelerating Complex Simulations:

    An Example From Manufacturing

    Eric Lequiniou

    Director, High Performance Computing

    Altair

    [email protected]

  • 2

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Agenda

    • Who is Altair?

    • What is RADIOSS and its performance?

    • How Altair uses Intel tools to optimize our software

    • Q&A

  • 3

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Overview

    Founded ...

    In 1985 as a product design consulting company

    Today ...

    A global software, services & technology leader

    with over 45 offices in 21 countries and

    5,000+ customers worldwide

    ’85 ’13

    $250M

    $100M

  • 4

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Innovation Intelligence®

    27+Years of Innovation

    45+Offices in 21 Countries

    2000+Employees Worldwide

  • 5

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Altair Knows HPC

    Altair is the only company that:

    makes HPC tools…

    AND develops HPC applications…

    …AND uses these to solve real problems

    700+ Altair engineers worldwide

    work with clients every day to

    address technical computing

    challenges and develop

    solutions

  • 6

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Intel and Altair: Partners in HPC

    A history of collaboration…

    • Cluster Management: PBS-Intel integrations

    • MPI integration

    • Intel® Cluster Checker

    • Xeon Phi coprocessor

    • Certifications: Intel Cluster-Ready

    • PBS Professional

    • Solvers (RADIOSS, OptiStruct, AcuSolve, FEKO)

    • ICR 2011, 2012 and 2013 partner awards

    • Application Integration: Use of Intel tools and technologies

    • Intel® MPI, Intel® Fortran & C++ compilers, Intel® MKL Library, Intel® VTune™

    Amplifier, Intel trace analyzer & collector

    • Benchmarking activities on large cluster configurations

    • Professional Support: Close collaboration among technical personnel

    • Access to Intel hardware resources: SDV systems, large cluster

    • Intel technical expertise helps us to optimize our software on Intel systems

  • 7

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    HyperWorks Solver Technology

    Multiphysics Analysis and Optimization

    Structural

    Analysis

    Manufacturing

    Simulation

    Systems

    SimulationFluid

    Dynamics

    Thermal

    Analysis

    Crash,

    Safety,

    Impact &

    Blast

    Electro-

    Magnetics

  • 8

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    HyperWorks Solvers

    Thermal

    and CFD

    Highly Non-

    Linear

    Crash

    Safety

    Forming

    Statics

    NVH

    Thermal

    Non-Linear

    Multi-body

    Dynamics

    OptiStruct RADIOSS MotionSolve AcuSolve

    Optimization

    Smart Multiphysics

    FEKO

    Electro-

    Magnetics

  • 9

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    RADIOSS: The Standard Behind Structure Safety

    Multiphysics Analysis and Optimization

    Crash and

    SafetyDrop &

    Impact

    Blast &

    Hydrodynamic

    Impact

    Fluid-

    Structure

    Interaction

    Terminal

    Ballistic

    Forming &

    Composites

    Mapping

  • 10

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Introduction to RADIOSS and HPC Numerical Simulation

    1987: First full car crash computation in 20 hours

    • 20 000 elements only, with limited accuracy

    • Took 20h on the Cray XMP vector supercomputer

    Today: 15 million car crash simulation in less than 5 hours

    • Going massively parallel is key for such outstanding performance on cluster!

    • RADIOSS optimized to deliver best performance from single CPU to large clusters

    • RADIOSS embeds state-of-the-art numerical methods and parallelization

    techniques

    • RADIOSS is used with 64~128 cores on today industrial crash models – with a

    proven scalability up to 8000 cores

  • 11

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    About RADIOSS

    • Finite Element Analysis (FEA) solver

    for highly non-linear simulations

    • Differentiated by its scalability, quality,

    and robustness

    • Legacy code of several millions of

    Fortran lines

    • Ported under various systems,

    supercomputers, clusters and

    accelerators

    Compute-intensive simulation

    software for Manufacturing

  • 12

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Key Parallel Technologies: Hybrid MPI OpenMP

    Highly parallel code with Hybrid model

    • Domain decomposition with MPI

    • OpenMP parallelization

    • Explicit multitasking

    • Loop auto-parallelization

    Enhanced performance

    • High efficiency on large HPC clusters

    • Unique proven method for rich scalability over thousands of cores for FEA

    • Flexibility – easy tuning of MPI & OpenMP

    • Double Precision as default – Extended Single Precision ~ 1.5X faster

    Robustness

    • Parallel arithmetic option allows perfect repeatability in parallel

  • 13

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Performance increased by 14x from Woodcrest to Haswell• This breakthrough comes from hardware and software optimization

    • Most important factor is # of cores increase and software scalability

    * Based on RADIOSS Performance on

    Neon 1M benchmark, DP version

    0

    5

    10

    15

    0

    8

    16

    24

    32P

    erf

    orm

    ance

    #core

    s

    Single Node Dual Socket Performance* Evolution

    Performance #core per node Freq GHz

    RADIOSS: Performance Improvements on Single Node

  • 14

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    8568

    4516

    2416

    1511 12422104

    8421

    4319

    2294

    1375985 905

    1645

    8250

    4293

    2340

    1362 888 629 614

    8355

    4407

    2366

    1387842 614 556

    0

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    8000

    9000

    10000

    1 2 4 8 16 32 64

    Elapsed (s)

    Number of Nodes*

    Neon Refined 1 Million 80ms RADIOSS v13.0 betaScalability Study up to 1280 cores

    1 thread 2 threads 5 threads 10 threads

    * Each node is HP BL460c-gen8 with dual Intel Xeon E5-2680 v2 @2.8GHz with 20 cores & 128 GB 1600 MHz DIMM per node – Infiniband FDR

    • Hybrid outperforms pure MPI when using 8 nodes and more

    • Recommendation: one MPI per socket and as many OpenMP threads as physical cores

    • For same node number, RADIOSS 13 on E5 v2 ~1.5X faster than RADIOSS 12 on E5

    RADIOSS: HMPP Scalability on Clusters

    mailto:[email protected]

  • 15

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    RADIOSS Development: Intel Cluster Studio Tools

    At Altair, we actively use

    • Intel Fortran and C/C++ compilers

    • Intel Math Kernel Library (MKL)

    • Intel MPI Library

    • Intel VTune Amplifier for performance analysis

    • Intel Trace Analyzer and Collector for MPI analysis

    Under

    • Linux, Windows and Mac OS/X (compilers)

    • Xeon and Xeon Phi

  • 16

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Compilers & Math Library

    • ifort and icc

    • Use highest level of optimization flags with respect to correctness & accuracy

    • Check compiler reports (-vec-report=3)

    • Optimize for different platforms (-ax)

    • Static link of compiler libraries

    • Importance to upgrade & validate new compiler release to benefit from latest

    optimization (12.1 & 13)

    • MKL

    • Available with compiler package

    • Optimized for new hardware

    • OpenMP support

  • 17

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Intel MPI Library

    MPI: Message Passing Interface required to communicate between

    processes in a distributed memory environment

    • Dynamically linked to support latest installed versions and hardware at

    customer site

    • Easy optimization of data and process placement in Hybrid MPI OpenMP

    • KMP_AFFINITY=scatter or compact

    • I_MPI_PIN_DOMAIN=auto or omp

    • Scalability at large scale under Infiniband

    • I_MPI_DEVICE=rdssm or I_MPI_FABRICS=shm:dapl for Infiniband

    • Intel MPI proven efficiency at large scale with RADIOSS

  • 18

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Profiling Analysis: Intel VTune Amplifier 1/4

    Run amplxe-gui

    directly with the

    optimized binary

    Basic Hotspots

    to begin with

  • 19

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Profiling Analysis: Intel VTune Amplifier 2/4

    Compatible with

    MPI, OpenMP and

    Hybrid

  • 20

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Profiling Analysis: Intel VTune Amplifier 3/4

    Very useful Bottom-up

    analysis to identify top

    routines to optimize

  • 21

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Profiling Analysis: Intel VTune Amplifier 4/4

    Recompiling with –g

    allows profiling at

    instruction level in

    the source code

    Then go back to

    compiler report

    output

  • 22

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    MPI: Intel Trace Analyzer and Collector (ITAC) 1/3

    With dynamically linked Intel MPI program

    1) Run mpirun –trace executable to collect samples

    2) Run traceanalyzer executable.stf

    Start with the

    Event Timeline chart

    By default all the MPI and

    OpenMP threads appear

    Zoom in to dig into details

  • 23

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    MPI: Intel Trace Analyzer and Collector (ITAC) 2/3

    The Function Profile and Message Profile charts give additional information

    OpenMP threads into a same MPI process can be grouped (or ungrouped)

  • 24

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    MPI – Intel Trace Analyzer and Collector (ITAC) 3/3

    Click on a particular zone of

    the chart to get detailed

    information about current

    • MPI function called

    • Message sent

    By implementing different

    message tags into the code, it

    is easy to directly identify the

    matching subroutine

  • 25

    Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

    Takeaways

    • Highly parallel and optimized industrial code with

    Altair RADIOSS solver

    • Fruitful collaboration between Altair and Intel in

    terms of code development, tools, support, access to

    computational resources, and co-marketing activities

    • Powerful programming environment developed by

    Intel that matches our development needs

    • Importance of dedicated tools for performance

    optimization with Intel Vtune Amplifier and Intel

    Trace Analyzer and Collector

    • Availability for Xeon and Xeon Phi secures our

    investment and commitment in such technology for

    long term

  • Innovation Intelligence®

    Thank you for attending!

    Eric Lequiniou

    [email protected]