50
Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland 22 March 2016 H erbert Cornelius , Wojciech Wasko Intel EMEA www.intel.com/hpc

Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Looking back Into the (parallel) HPC Future“The changing HPC Landscape”

HPC Advisory Council ConferenceLugano, Switzerland22 March 2016

Herbert Cornelius, Wojciech WaskoIntel EMEA

w w w . i n t e l . c o m / h p c

Page 2: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Legal DisclaimerINFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THISDOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TOSALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUALPROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSIONCRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS,DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICALAPPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reservesthese for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with thisinformation.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Intel, Intel Xeon, Intel Xeon Phi™, Intel® Atom™ are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries.

Copyright © 2015, Intel Corporation

*Other brands and names may be claimed as the property of others.

Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similarperformance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase.The cost reduction scenarios described in this document areintended to enable you to get a better understanding of how the purchase of a given Intel product, combined with a number of situation-specific variables, might affect your future cost and savings. Nothing in this document should beinterpreted as either a promise of or contract for a given level of costs.

Intel® Advanced Vector Extensions (Intel® AVX)* are designed to achieve higher throughput to certain integer and floating point operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) someparts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies. Performance varies depending on hardware, software, and systemconfiguration and you should consult your system manufacturer for more information.

*Intel® Advanced Vector Extensions refers to Intel® AVX, Intel® AVX2 or Intel® AVX-512. For more information on Intel® Turbo Boost Technology 2.0, visit http://www.intel.com/go/turbo

All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors.

These optimizations include SSE2®, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.

Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the

applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Copyright © 2016 Intel Corporation. All rights reserved.

Page 3: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Copyright © 2015 Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Risk FactorsThe above statements and any others in this document that refer to plans and expectations for the second quarter, the year and the future are forward-looking statements that involve a number of risks and uncertainties. Words such as “anticipates,” “expects,” “intends,” “plans,” “believes,” “seeks,”“estimates,” “may,” “will,” “should” and their variations identify forward-looking statements. Statements that refer to or are based on projections, uncertainevents or assumptions also identify forward-looking statements. Many factors could affect Intel’s actual results, and variances from Intel’s currentexpectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intelpresently considers the following to be important factors that could cause actual results to differ materially from the company’s expectations. Demand forIntel's products is highly variable and, in recent years, Intel has experienced declining orders in the traditional PC market segment. Demand could bedifferent from Intel's expectations due to factors including changes in business and economic conditions; consumer confidence or income levels; customeracceptance of Intel’s and competitors’ products; competitive and pricing pressures, including actions taken by competitors; supply constraints and otherdisruptions affecting customers; changes in customer order patterns including order cancellations; and changes in the level of inventory at customers.Intel operates in highly competitive industries and its operations have high costs that are either fixed or difficult to reduce in the short term. Intel's grossmargin percentage could vary significantly from expectations based on capacity utilization; variations in inventory valuation, including variations related tothe timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp andassociated costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; and productmanufacturing quality/yields. Variations in gross margin may also be caused by the timing of Intel product introductions and related expenses, includingmarketing expenses, and Intel's ability to respond quickly to technological developments and to introduce new products or incorporate new features intoexisting products, which may result in restructuring and asset impairment charges. Intel's results could be affected by adverse economic, social, politicaland physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and other security risks,natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Intel’s results could be affected by the timing ofclosing of acquisitions, divestitures and other significant transactions. Intel's results could be affected by adverse effects associated with product defectsand errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual property, stockholder, consumer,antitrust, disclosure and other issues, such as the litigation and regulatory matters described in Intel's SEC filings. An unfavorable ruling could includemonetary damages or an injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices,impacting Intel’s ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. A detailed discussion ofthese and other factors that could affect Intel’s results is included in Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form10-K and earnings release.

Rev. 4/15/14

Page 4: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

HPC Everywhere

Industrial Revolution 4.0

1760’s 1860’s 2000’s

WWWMolecular Biology

Green EnergyMobility

Automation

Communication,Oil, Combustion Engine

New materialsHighways, Automobiles

Mass Production

CoalRailwaysFactories

Printing PressMass Education

1.0 2.0 3.0

Electrification DigitizationSteam

2020’s

Cloud ComputingSuper Information Data Highways

Personalized MedicineIndustrial Internet, M2MInternet of Things (IoT)

Smart “everything”Big Data (Analytics)

AI, ML/DL

4.0

FUTUREVISION

*Other names and brands may be claimed as the property of others. Copyright © 2016 Intel Corporation. All rights reserved.

Page 5: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Copyright © 2015 Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Computing has Changed

Page 6: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Microprocessors: Then and Now

Intel® 4004 (1971)10000nm, 2300 Transistors92 KOPS

Intel® Xeon Phi™(2016)**14nm 3D Tri-Gate, >8B Transistors3 TFLOPS (peak DP-F.P.)

**Codename Knights Landing, not drawn to scale

For illustration only, not drawn to scale. Potential future options are forecasts and subject to change without notice.

Transforming the Economics of HPC

Copyright © 2016 Intel Corporation. All rights reserved.

Page 7: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

1990’s 2010’s1970’sASCI RED KNIGHTS LANDING**CRAY-1

High-Performance computingOf

Proprietary Industry Standards Miniaturization

**For illustration only, not drawn to scale. Potential future options are forecasts and subject to change without notice.

*Other names and brands may be claimed as the property of others. Copyright © 2016 Intel Corporation. All rights reserved.

Page 8: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Driving Innovation and IntegrationEnabled by Leading Edge Process Technologies

Integrated Today Possible Tomorrow**

SYSTEM LEVEL BENEFITS IN COST, POWER, DENSITY, SCALABILITY & PERFORMANCE

**For illustration only, not drawn to scale. Potential future options are forecasts and subject to change without notice.

Copyright © 2016 Intel Corporation. All rights reserved.

Page 9: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Copyright © 2015 Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Copyright © 2016 Intel Corporation. All rights reserved.

Page 10: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Extreme Overclocking Achievements

†Source: HWBOT article dated 2015-08-07 http://hwbot.org/newsflash/2983_skylake_the_day_after_7_world_records_and_10_global_first_places

Disclaimer: These overclocking results are not typical.

Scores were achieved by extreme overclockers using

LN2 and other advanced techniques not commonly

available to average consumers. Overclocking results

are not guaranteed nor covered by warranty. Extreme

risk taking!

Rankings change regularly. Visit HWBOT’s

website for the latest http://hwbot.org

4 CoreS6.8 GHz

DDR44795 MHz

*Other brands and names are the property of their respective owners.

Frequency Standings

using Liquid Nitrogen†

as of 14-Apr-2016

Copyright © 2016 Intel Corporation. All rights reserved.

Page 11: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Parallelism on all Levels

SIMDVectorization

CPU/CORESMulti-Threading

NODESMessaging

Multi-Core(CPU)

Many-Core(CPU)

Core

Nodes(CLUSTER & FABRIC)

For illustration only, not drawn to scale..

Accelerators(FPGA)

Copyright © 2016 Intel Corporation. All rights reserved.

Page 12: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

3D Xpoint™ FPGAOmni-PathXeon PHI™ SiPh

Emerging Intel HPCTechnologies

COMPUTE FABRIC I/O MEMORYSTORAGE

ACCELERATION

Copyright © 2016 Intel Corporation. All rights reserved.

Page 13: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

14nm Tri-Gate process technology

High density FPGA fabric with up

to 5.5 million logic elements (LE)

Up to 10 TFLOPS SP-F.P. (IEEE-754)

performance

Up to 100 GFLOPS/W of single-

precision floating point efficiency

Copyright © 2016 Intel Corporation. All rights reserved.

Page 14: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Application performs post processing of

3D textures for analyzing rock samples• Code Labels 128x128x128 pixel textures in memory

and calculates overlap

• Large data size

o >20GB input file

o >8M textures

Kernel code:for(iz=-tw/2; iz < tw/2 ; iz++) {

for(iy=-tw/2; iy < tw/2 ; iy++) {for(ix=-tw/2; ix < tw/2 ; ix++) {

/* Copy texture into buffer */buf[(iz+tw/2)*tw*tw + (iy+tw/2)*tw + ix + tw/2] = image[(z+iz)*dimx*dimy + (y+iy)*dimx + (x+ix)];

/* Label the texture */label_texture(buf, tw, tw, tw, val_min, val_max, nl, nb, bodies, matrix_a, matrix_a_b, matrix_a_c, matrix_a_d);

}}

}

438x K80 cards

4 racks

116KW

162x Stratix V A7 cards

2 racks

13.7KW

Nallatech*

8.5xless energy

Source: M. Hilgeman, Dell Accelerating Understanding Summit 2015

Offload kernel to an accelerator to meet the goal

of labeling 8 million textures in 30 minutes.

*Other brands and names are the property of their respective owners.Copyright © 2016 Intel Corporation. All rights reserved.

Page 15: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Tens of billions of transistors per chip (as predicted by Moore’s Law)Chip-Level Multiprocessing (CMP) – Multi/Many-Core ArchitecturesData and Thread Level Parallelism – SIMD and MT/HT Technologies

Instruction Level Parallelism (ILP)

Vision Paper 2005

Copyright © 2016 Intel Corporation. All rights reserved.

Page 16: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Tera-Scale Research White Paper 2006

Software/Hardware differentiation options

**For illustration only, not drawn to scale. Potential future options are forecasts and no indication of product plans.Copyright © 2016 Intel Corporation. All rights reserved.

Page 17: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

“ ”

FUTURE ?Mix of Cores, integratingFPGA, Accelerators, ...**

PRESENTMulti-CoreMany-Core

power constrainttransistor constraint

PASTSingle Core

CPU

even more power constraint

**For illustration only, not drawn to scale. Potential future options are forecasts and no indication of product plans.

High-Performance ComputingNew compute Paradigms

Copyright © 2016 Intel Corporation. All rights reserved.

Page 18: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

The Need for New Memory

Source: Energy Aware Memory Technology and New Memory System Hierarchy, Frank Koch, Samsung Semiconductor, ENA-HPC 2013 Conference, Sep’2013

*Other brands and names are the property of their respective owners.

Bandwidth - Latency – Capacity - Power

Source: Tera-Scale Memory Challenges and Solutions, Intel Technology Journal, Vol. 13, Issue 4, 2009

Copyright © 2016 Intel Corporation. All rights reserved.

Page 19: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Highly Parallel Processing

>3TFpeak DP

PERFORMANCE

3XFASTER

ST PERFORMANCEVS. KNC

5XFASTER

MCDRAM VS.DDR4 DIMMs

Knights LandingNext-Gen Intel® Xeon Phi™ processor

CPU

MCDRAM

DDR

NAND SSD

Hard Disk Drives

For illustration only. All dates, product descriptions, features, availability, and plans are forecasts and subject to change without notice.

Copyright © 2016 Intel Corporation. All rights reserved.

Page 20: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

MCDRAM Modes – KNIGHTS LANDINGCache mode

• No source code changes required

• Misses are expensive (higher latency)

• Needs MCDRAM access + DDR access

Flat mode

• MCDRAM mapped to physical address space

• Exposed as a NUMA node

• Explicitly allocated/accessed

Hybrid

• Combination of the above

• E.g., 8 GB in cache + 8 GB in Flat Mode

20

Page 21: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

MCDRAM Modes – KNIGHTS LANDINGCache mode

• No source code changes required

• Misses are expensive (higher latency)

• Needs MCDRAM access + DDR access

Flat mode

• MCDRAM mapped to physical address space

• Exposed as a NUMA node

• Explicitly allocated/accessed

Hybrid

• Combination of the above

• E.g., 8 GB in cache + 8 GB in Flat Mode

21

Page 22: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

MCDRAM Modes – KNIGHTS LANDINGCache mode

• No source code changes required

• Misses are expensive (higher latency)

• Needs MCDRAM access + DDR access

Flat mode

• MCDRAM mapped to physical address space

• Exposed as a NUMA node

• Explicitly allocated/accessed

Hybrid

• Combination of the above

• E.g., 8 GB in cache + 8 GB in Flat Mode

22

Page 23: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

MCDRAM IN FLAT MODE ACCESS – MEMKIND LIBRARY

23

float *fv;

fv = (float *)malloc(sizeof(float) * 1000);

Allocate 1000 floats from DDR

#include <hbwmalloc.h>

float *fv;

fv = (float *)hbw_malloc(sizeof(float) * 1000);

Allocate 1000 floats from MCDRAM

c Declare arrays to be dynamic

REAL, ALLOCATABLE :: A(:), B(:), C(:)

!DEC$ ATTRIBUTES FASTMEM :: A

NSIZE=1024

c

c allocate array ‘A’ from MCDRAM

c

ALLOCATE (A(1:NSIZE))

c

c Allocate arrays that will come from DDR

c

ALLOCATE (B(NSIZE), C(NSIZE))

Allocate arrays from MCDRAM and DDR in Intel® Fortran Compiler

Page 24: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

READ THE MANUALI could walk you through the entire memkind API...

The engineer in me would very much enjoy that*, but... 10 minutes, remember?

So, as the saying goes:

24

*That might – however – not be true of the audience...

Page 25: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

READ THE MANUALI could walk you through the entire memkind API...

The engineer in me would very much enjoy that*, but... 10 minutes, remember?

So, as the saying goes:

25

*That might – however – not be true of the audience...

A picture is worth a 1k words and a demo is worth 10e6 thereof.

Page 26: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind 26

results_t run_bench(data_t *A, data_t *B, data_t *C, const char* id)

{

// ... begin timing

//!! START WORKLOAD

for (size_t i = 0; i < NUM_ITERATIONS; ++i) {

#pragma omp parallel for simd

for (size_t i = 0; i < NUM_ELEMENTS; ++i) {

C[i] = A[i] + B[i];

}

}

//!! END WORKLOAD

// ... end timing

}

“STREAM” addition

One function will operate on different memory operands:• one input set in regular DDR• other input set in MCDRAM

The input sets contain the same data.

A BW-hungry demo

Page 27: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

A BW-hungry workload – “INFRASTRUCTURE”

27

int main()

{

data_t *A_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *B_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *C_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *A_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *B_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *C_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

init(A_reg, B_reg, C_reg, A_hbw, B_hbw, C_hbw);

auto res_reg = run_bench(A_reg, B_reg, C_reg, "[REG]");

auto res_hbw = run_bench(A_hbw, B_hbw, C_hbw, "[HBW]");

std::cout << "Computations happened " << res_reg/res_hbw

<< "x times faster in high-bandwidth memory.\n";

free(A_reg);

free(B_reg);

free(C_reg);

hbw_free(A_hbw);

hbw_free(B_hbw);

hbw_free(C_hbw);

}

ALLOCATING REGULAR MEM

Page 28: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

A BW-hungry workload – “INFRASTRUCTURE”

28

int main()

{

data_t *A_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *B_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *C_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *A_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *B_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *C_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

init(A_reg, B_reg, C_reg, A_hbw, B_hbw, C_hbw);

auto res_reg = run_bench(A_reg, B_reg, C_reg, "[REG]");

auto res_hbw = run_bench(A_hbw, B_hbw, C_hbw, "[HBW]");

std::cout << "Computations happened " << res_reg/res_hbw

<< "x times faster in high-bandwidth memory.\n";

free(A_reg);

free(B_reg);

free(C_reg);

hbw_free(A_hbw);

hbw_free(B_hbw);

hbw_free(C_hbw);

}

ALLOCATING HIGH-BW MEM

Page 29: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

A BW-hungry workload – “INFRASTRUCTURE”

29

int main()

{

data_t *A_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *B_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *C_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *A_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *B_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *C_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

init(A_reg, B_reg, C_reg, A_hbw, B_hbw, C_hbw);

auto res_reg = run_bench(A_reg, B_reg, C_reg, "[REG]");

auto res_hbw = run_bench(A_hbw, B_hbw, C_hbw, "[HBW]");

std::cout << "Computations happened " << res_reg/res_hbw

<< "x times faster in high-bandwidth memory.\n";

free(A_reg);

free(B_reg);

free(C_reg);

hbw_free(A_hbw);

hbw_free(B_hbw);

hbw_free(C_hbw);

}

RUNNING BENCHMARK

Page 30: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

A BW-hungry workload – “INFRASTRUCTURE”

30

int main()

{

data_t *A_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *B_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *C_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *A_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *B_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *C_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

init(A_reg, B_reg, C_reg, A_hbw, B_hbw, C_hbw);

auto res_reg = run_bench(A_reg, B_reg, C_reg, "[REG]");

auto res_hbw = run_bench(A_hbw, B_hbw, C_hbw, "[HBW]");

std::cout << "Computations happened " << res_reg/res_hbw

<< "x times faster in high-bandwidth memory.\n";

free(A_reg);

free(B_reg);

free(C_reg);

hbw_free(A_hbw);

hbw_free(B_hbw);

hbw_free(C_hbw);

}

RESULTSOUTPUT

Page 31: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

A BW-hungry workload – “INFRASTRUCTURE”

31

int main()

{

data_t *A_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *B_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *C_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *A_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *B_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *C_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

init(A_reg, B_reg, C_reg, A_hbw, B_hbw, C_hbw);

auto res_reg = run_bench(A_reg, B_reg, C_reg, "[REG]");

auto res_hbw = run_bench(A_hbw, B_hbw, C_hbw, "[HBW]");

std::cout << "Computations happened " << res_reg/res_hbw

<< "x times faster in high-bandwidth memory.\n";

free(A_reg);

free(B_reg);

free(C_reg);

hbw_free(A_hbw);

hbw_free(B_hbw);

hbw_free(C_hbw);

}

FREEING REGULAR MEM

Page 32: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

A BW-hungry workload – “INFRASTRUCTURE”

32

int main()

{

data_t *A_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *B_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *C_reg = (data_t*) malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *A_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *B_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

data_t *C_hbw = (data_t*) hbw_malloc(sizeof(data_t) * NUM_ELEMENTS);

init(A_reg, B_reg, C_reg, A_hbw, B_hbw, C_hbw);

auto res_reg = run_bench(A_reg, B_reg, C_reg, "[REG]");

auto res_hbw = run_bench(A_hbw, B_hbw, C_hbw, "[HBW]");

std::cout << "Computations happened " << res_reg/res_hbw

<< "x times faster in high-bandwidth memory.\n";

free(A_reg);

free(B_reg);

free(C_reg);

hbw_free(A_hbw);

hbw_free(B_hbw);

hbw_free(C_hbw);

}

FREEING HIGH-BW MEM

Page 33: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

A BW-hungry workoad RESULTS – KNIGHTS LANDING

...

33

Page 34: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

$ ./run.sh

[REG] Calculations took 15376.5 [units].

[HBW] Calculations took 3056.19 [units].

Computations happened 5.03125x times faster in high-bandwidth memory.

34

NOTE: those are some units, only hereto show relative performance difference.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visithttp://www.intel.com/performance.

Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.Notice Revision #20110804

A BW-hungry workoad RESULTS – KNIGHTS LANDING

Page 35: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

BUT I DON’t HAVE A KNL (YET)!You can use Intel® QPI on multi-socket machine to emulate high- and low-bandwidth memory

$ MEMKIND_HBW_NODES=0 OMP_NUM_THREADS=4 \

numactl --membind=1 --cpunodebind=0 ./memkind_arrs

[REG] calculations took 63716.1 [other-units].

[HBW] calculations took 27865.6 [other-units].

Computations happened 2.28656x times faster in high-bandwidth memory.

Meaning: You can play with memkind (or just use it) even if you don’t have an MCDRAM-equipped machine!

35

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visithttp://www.intel.com/performance.

Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.Notice Revision #20110804

Intel, the Intel logo, the Intel Inside logo, Xeon, and Intel Xeon Phiare trademarks of Intel Corporation in the U.S. and/or other countries.

Page 36: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

By now I hope to have captured your attention

Heterogeneous memory architecures are happening.

Please, give it a try.

It’s important for us, but it’s primarily important for you.

36

Page 37: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

*Other names and brands may be claimed as the property of others https://github.com/memkind/memkind

Keep in touchJoin IXPUG (The Intel Xeon Phi User's Group)

• New Memory Types WG

Describe usage models, review requirements, establish best known methods for new memory types

Sign up form available at:

https://docs.google.com/forms/d/1FoRHl6NDn7u0ALnGRtMF5q2X3R1h1MeCDLxJJOy55rs/viewform

Join the memkind mailing list:

https://lists.01.org/mailman/listinfo/memkind

Patches welcome!

https://github.com/memkind/memkind

37

Page 38: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

SSD Capacity (R)eVOLUTION

Future options are forecasts and subject to change without notice.

*Other brands and names are the property of their respective owners.

3D-NAND

Image: Micron.

Copyright © 2016 Intel Corporation. All rights reserved.

Page 39: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

NVM

Copyright © 2016 Intel Corporation. All rights reserved.

Page 40: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

New MEMORY AND STORAGE

1000Xfaster

THAN NAND

1000Xendurance

OF NAND

10Xdenser

THAN DRAM

3D XPoint™ Technology

CPU

MCDRAM

DDR

INTEL® DIMMS

Intel® Optane™ SSD

NAND SSD

Hard Disk Drives

For illustration only. All dates, product descriptions, features, availability, and plans are forecasts and subject to change without notice.

Copyright © 2016 Intel Corporation. All rights reserved.

Page 41: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

3D Xpoint™ Technology (NVM)

Intel® Optane™ SSD DIMMS** based on 3D Xpoint™

NEW CLASS OF NON-VOLATILE MEMORY**4x more memory capacity

½ cost of DRAM

NEW CLASS OF NON-VOLATILE STORAGE1000x faster than NAND

1000x endurance of NAND

**For illustration only, future potential options are forecasts and subject to change without notice.*Other brands and names are the property of their respective owners.

Copyright © 2016 Intel Corporation. All rights reserved.

Page 42: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

The New HPC Fabric

Number of switch chips required, switch density, and fabric scalability are based on a full bisectional bandwidth (FBB) Fat-Tree configuration, using a 48-port switch for Intel® Omni-Path Architecture and 36-port switch ASIC for either Mellanox* or Intel® True Scale Fabric. *Other names and brands may be claimed as the property of others. 2.3X fabric scalability based on a 27,648-node cluster configured with the Intel®

Omni-Path Architecture using 48-port switch ASICs, as compared with a 36-port switch chip that can support up to 11,664 nodes.

Intel® Omni-PatH™Architecture

100Gbit/s

PERPORT

1.3XHigherSWITCHDENSITY

2.3XGreater

FABRICSCALABILITY

600

500

400

300

200

100

0

Sw

itch

Ch

ips

Re

qu

ire

d

Nodes

Intel® Omni-Path 48-port switch

InfiniBand* 36-port switch

FEWER SWITCHESREQUIRED

*Other brands and names are the property of their respective owners.Copyright © 2016 Intel Corporation. All rights reserved.

Page 43: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Intel® Silicon Photonics

www.intel.com/content/www/us/en/research/intel-labs-silicon-photonics-research.html

For illustration only. All dates, product descriptions, features, availability, and plans are forecasts and subject to change without notice.

THE ONLY ON-DIE INTEGRATED LASERLongest reach at 2kmHighest port density

>20% cost advantage

Copyright © 2016 Intel Corporation. All rights reserved.

Page 44: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Copyright © 2015 Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Software Defined VISUALIZATIONhttp://www.sdvis.org/

Copyright © 2016 Intel Corporation. All rights reserved.

Page 45: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Unified Architecture for HPC & HPDA

HPC HPDA

FORTRAN / C++ ApplicationsMPI

High Performance

Java* ApplicationsHadoop*, Spark*

Simple to Use

Lustre* with Hadoop* AdapterRemote Storage

Compute & Big Data CapableScalable Performance Components

Server Storage(SSDs and Burst Buffers)

Intel®Omni-Path

Architecture

Infrastructure

Programming Model

Resource Manager

File System

Hardware

HPC & Big Data-Aware Resource Manager

HPC Optimized Libraries HPDA Optimized LibrariesSoftware Libraries

*Other brands and names are the property of their respective owners.Copyright © 2016 Intel Corporation. All rights reserved.

Page 46: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

A Holistic Design Solution for HPCIntel® Scalable System Framework (SSF)

Small Clusters Through Supercomputers

Compute and Data-Centric Computing

Standards-Based Programmability

On-Premise and Cloud-Based

Tighter Integration

Intel® Xeon® Processors

Intel® Xeon Phi™ Processors

Intel® Xeon Phi™ Coprocessors

Intel® Server Boards and Platforms

Intel® Solutions for Lustre*

Intel® SSDs

Intel® Optane™ Technology

3D XPoint™ Technology

Intel® Omni-Path Architecture

Intel® True Scale Fabric

Intel® Ethernet

Intel® Silicon Photonics

HPC System Software Stack

Intel® Software Tools

Intel® Cluster Ready Program

Intel® Visualization Toolkit

*Other brands and names are the property of their respective owners.Copyright © 2016 Intel Corporation. All rights reserved.

Page 47: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Aurora System: HPC AND Big DataIt’s one more landmark.

It’s the next one we have to reach.

But the journey does not stop there.

Aurora SystemArgonne National Laboratory

>180PFLOPS

Extreme performance for a broad range of compute and

data-centric workloadscoming 2018

*Other brands and names are the property of their respective owners.Copyright © 2016 Intel Corporation. All rights reserved.

Page 48: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

“The GAP”Software Modernization

Copyright © 2016 Intel Corporation. All rights reserved.

Page 49: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Intel Parallel Computing Centerssoftware.intel.com/en-us/ipcc

Intel Code Modernization Programsoftware.intel.com/en-us/code-modernization-enablement

Intel Parallel Universe Magazinesoftware.intel.com/en-us/intel-parallel-universe-magazine

Modern Code Developer Communitysoftware.intel.com/en-us/modern-code

Copyright © 2016 Intel Corporation. All rights reserved.

Page 50: Looking back Into the (parallel) HPC Future€¦ · Looking back Into the (parallel) HPC Future “The changing HPC Landscape” HPC Advisory Council Conference Lugano, Switzerland

Copyright © 2015 Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Thank You.

Copyright © 2016 Intel Corporation. All rights reserved.