32
©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject to change without notice. All information is provided on an “AS IS” basis without warranties of any kind. Statements regarding products, including regarding their features, availability, functionality, or compatibility, are provided for informational purposes only and do not modify the warranty, if any, applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their respective owners. Balint Fleischer Senior Director Advanced Computing Solutions September 25th, 2019 Accelerating Time to Insight Performance optimized emerging system architectures 1

Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject

to change without notice. All information is provided on an “AS IS” basis without warranties of any kind.

Statements regarding products, including regarding their features, availability, functionality, or

compatibility, are provided for informational purposes only and do not modify the warranty, if any,

applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron

trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their

respective owners.

Balint FleischerSenior Director

Advanced Computing Solutions

September 25th, 2019

Accelerating Time to Insight Performance optimized emerging

system architectures

1

Page 2: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Estimated data

1 Zettabyte = 10^21 (one billion, trillion) bytes

Rapidly

growing

Dark Data

2

Rapidly growing Data sets from many sources

+Increasing complex algorithms

+Need for faster Time to Insight

+Affordability challenges

Dark Data

Page 3: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Perfect storm:

Moore’s law is

slowing,

Dennard scaling is

ending and

von Neumann

architecture

became a bottleneck

B. Meisner, The Bump in the road to ExaFlops and Rethinking LINPACK,

HPC User Forum, June 2014

3

Page 4: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Large,

industry wide

approach but

without a

master plan

to insure

continued

scaling

New, more scalable

Architectures

Reducing

Data access cost

Making IO a first

Class “citizen”

Bridge the latency gap with

New Memory technologies

4

Page 5: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

5

New, Non Von

Neumann

Architectures

Scale well (for

a while) with

process

technology

improvements

Page 6: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

General

purpose CPU

performance

CAGR is

declining

6

Page 7: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Machine Learning

+

Deep Learning

emerging as the

key application for

data analytics

A rich target for

Domain specific

computing

7

Page 8: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

8

On a given

technology node,

Domain Specific

Architectures

deliver greater

performance on

targeted

applications

Pattern for domain specific architecturesSimpler, lower performance, more efficient processing elements

High degree of parallelism (1000s of processing elements)

Highly optimized on die data movement

New, application specific ISA

Custom compiler

Page 9: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

9

An example of

Domain Specific

Architecture

delivering greater

performance

Optimized AI

processors deliver

even better gain

NVIDIA Volta architecture white paper

~400% perf

jump vs. 25%

for CPU

Page 10: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

There are over 300 AI processor designs worldwide innovating from Low End to High End

10

EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge

Hot Chips 2019, Cerebras Presentation

Claimed performance

2.8 TOPS @1.6W

~2x faster vs. CPU

1/15th power vs. CPU

Page 11: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

11

“Cost” of IO is critical to

performance scaling and

energy consumption

The goal is improving

Bandwidth, reducing

Latency & reducing Data

movement

Page 12: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Improving Data

Proximity is a

vehicle to

address IO cost

Processing

Element* For simplicity single socket CPU Hosts are shown

12

Various approaches

Page 13: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

NAND die capacity

continues to grow

enabling more

dense SSDs and

storage solutions

13

Difficult to Beat the NAND Cost Structure

Page 14: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

SSD IO Bandwidth

is lagging SSD

Capacity growth

14

Query of very large data sets

Will take an increasingly long time

Page 15: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

NAND stacks and

SSDs are designed for

capacity scaling

Device level aggregate

BW is ~100x vs. SSD

External IO Bandwidth

Die level Bandwidth is

~1000x

15

Page 16: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Moving processing into

SSD can benefit from the

high Embedded

Bandwidth

Time to Insight on very

large shardabledata will

also benefit from

Massive parallelism

9.6 TB/s aggregate untapped RAW

internal BW

Usable BW can be as much as ~25%

652 GB/s aggregate IO

BW at SSD connector

CPU

DRAM

Ho

st

54 GB/s

652 GB/s

(24x13.6 GB/s)

SSD SSD SSD

8% of Drive Bay BW

CPU

DRAM

Ho

st

SSD SSD SSD

54 GB/s

9.6 TB/s

(24x400 GB/s)

.05% of NAND BW

Potential Internal BW

Parallel Processing coupled with

Near Data Computing

enables faster time to insight

NAND

Devices

Opening up

this to

applications

Note: In both cases there are 24 drives per JBOF, identical PCIe G5 NVMe interface.

* For simplicity single socket CPU Hosts are shown

16

Page 17: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Near Memory Computing

can Deliver 5-10x better

Bandwidth to applications

in a 2.5D packaging

technology

17

Page 18: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Tighter integration

of processing and

memory can lead to

even greater

Performance at

lower Power [Kim et al., NeuroCube, ISCA 2016]

18

Near Memory Computing example

Page 19: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

On Die integration of

compute and memory

can take advantage of

~160x on Die bandwidth

on smaller Data sets

19

Processing

Element

Page 20: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

20

Large

investment into

improving

Platform IO

Page 21: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Increasing

integrated IO lane

count to grow

connectivity

Faster PCIe will

enable higher

speed devices

Silicon photonics to

extend reach21

* For simplicity single socket CPU Hosts are shown

Page 22: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

22

Emerging Media

technologies to

improve capacity,

latency and

access methods

Page 23: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Emerging Memory

research is

intensifying

23

Difficult to beat

DRAM Performance

& Energy

But

DRAM density growth

is slowing

Lower

LatencyHigher

Density

Page 24: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Emerging

Memories can fill

data access

latency gaps and

enable new

storage models

24

Page 25: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Attaching EM

requires server

architecture changes

CXL is the emerging

Standard for EM

attach

25

PCIe

PCIe Gen4

IO devices

SSD

Accelerators

CXL

PCIe Gen5

2x IO Bandwidth

IO devices

SSD

Emerging Memory

Coherent Accelerators

Today ~2022

* For simplicity sake single CPU Hosts are shown

CPU

DRAM

Ho

st

CPU

DRAM

Ho

st

IO

semantics

Memory

semanticsMemory

semantics

Memory + IO

semantics

Page 26: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

CXL enables

innovations ranging

from different

memory types to

heterogeneous

computing

26

CXL

Block

EMM

CXL

LD/ST

EMM

CXL

LD/ST

DRAM

CXL

AI Engine

NAND

CXL

Near Memory

DRAM

Some possible memory examples

Page 27: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Scaling Emerging

Memory capacity

is critical to address

use case

requirements

27

~TB

* For simplicity, single CPU Hosts are shown

CXL

CPU

DRAM

Ho

st

Many TB

Memory Semantics Memory Semantics

CPU

DRAM

Ho

st

54 GB/s652 GB/s

(24x13.6 GB/s)CXL expander

CXL

Emerging memory based

Expansion modules

Page 28: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

The collection of

these new

technologies and

architectures will

impact how we

build future Data

Centers and

Systems

28

Page 29: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Co locating Compute

and Data requires

change in data

placement,

provisioning and

load balancing

strategies

* For simplicity single socket CPU Hosts are shown

Strict provisioning

and

data placement rules

Emergence of

non uniform

resource pools

Complex data

Partitioning models

29

Page 30: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Heterogeneous

Resource

Pools

Example

30

* For simplicity single socket CPU Hosts are shown

Page 31: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

Exciting times.

The Data Growth and the need for Faster Insight drives transitioning from decades old architectures to a new, emerging model utilizing

breakthrough technologies

Summary

Buckle your seatbelt!

31

Page 32: Accelerating Time to Insight - ECMWF Events (Indico) · EETimes 08.29.10 Details of Hailo AI Edge Accelerator Emerge Hot Chips 2019, Cerebras Presentation Claimed performance 2.8

32

Thank you!

[email protected]