Tesla: Fastest Processor Adoption in HPC History · 10/2/2009 · CUDA Ecosystem Applications Libraries FFT BLAS LAPACK Image processing Video processing Signal processing Vision

Tesla: Fastest Processor Adoption in HPC History

Jen-Hsun Huang

Co-founder, President and CEO of NVIDIA

June 30th 2009

R8 Ray Tracing.wmv

19955,000 triangles/second800,000 transistors GPU

2008350 Million triangles/second1.4 Billion transistors GPU

L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1

L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1

GPU for Computing

Massively parallel, throughput architecture

Science in Desperate for Computing Throughput

1982 1997 2003 2006 2010 2012

1,000,000,000

1,000,000

1,000

1

Gigaflops

Estrogen Receptor36K atoms

F1-ATPase327K atoms

Ribosome2.7M atoms

Chromatophore50M atoms

BPTI3K atoms

Bacteria100s of

Chromatophores

1 Exaflop

1 Petaflop

Ran for 8 months to simulate 2 nanoseconds

Power Crisis in Supercomputing

1982 1996 2008 2020

Exaflop

Petaflop

Teraflop

Gigaflop

Household Power

Equivalent

City

Town

Neighborhood

Block

7,000,000 Watts

25,000,000 Watts

850,000 Watts

60,000 Watts

Jaguar

Los Alamos

The GPU Computing Discontinuity

0

200

400

600

800

1000

1200

9/22/2002 2/4/2004 6/18/2005 10/31/2006 3/14/2008

Gflops(log scale) NVIDIA GPU

Intel CPU

Tesla 8-series

Tesla 10-series

Intel Xeon Quad-core 3 GHzIntel Pentium 4

3.2 GHz

Intel Pentium 4Dual-core 3.0 GHz

Intel Core2Dual-core 3.0 GHz

Double Precision

debut

4 cores

Advent of GPU Computing

CPU + GPU Co-Processing

GPU Computing Applications

C

C++

Java

FortranOpenCLtm DirectX

Compute

NVIDIA GPUCUDA Parallel Computing Architecture

OpenCL is trademark of Apple Inc. used under license to the Khronos Group Inc.

CUDA GPU Computing Architecture

CUDA Ecosystem

Applications Libraries

FFTBLAS

LAPACKImage processingVideo processingSignal processing

Vision

Consultants OEMs

Languages

C, C++DirectXFortranJava

OpenCLPython

Compilers

PGI FortranCAPS HMPP

MCUDAMPI

NOAA Fortran2COpenMP

UIUCMIT

HarvardBerkeley

CambridgeOxford

…

IIT DelhiDortmundtETH Zurich

Uni. PerpignanEcole CentraleParis 6 Jussieu

…

Over 200 Universities Teaching CUDA

Oil & Gas Finance

Medical Biophysics

Numerics

Imaging

CFD

DSP EDA

Debuggers

AlineaTotalView

http://www.supermicro.com/

http://en.wikipedia.org/wiki/File:Logo_groupe_bull.jpg

http://images.google.com/imgres?imgurl=http://fishtrain.com/wp-content/uploads/2007/09/cray_logo.gif&imgrefurl=http://fishtrain.com/2007/09/03/nvidias-playbook/&usg=__mBEPjqB6tUo0mps50ld866NdmmI=&h=70&w=160&sz=3&hl=en&start=8&sig2=erIWlru80_C67bxBapde6g&tbnid=ooG9_suq3ywK-M:&tbnh=43&tbnw=98&prev=/images?q=cray+logo&gbv=2&hl=en&ei=aHYpSvyWEo-ctgPd-dXxCg

http://www.google.com/imgres?imgurl=http://blog.taragana.com/wp-content/uploads/2009/05/nec-logo.jpg&imgrefurl=http://blog.taragana.com/index.php/t/east-asia/&h=354&w=354&sz=8&tbnid=YJa5kHMJJ5aMmM:&tbnh=121&tbnw=121&prev=/images?q=NEC+logo&hl=en&usg=__vqs8CIGTn2HFsKXlXcsnKjhGaww=&ei=Q98zSsTUG4vWsgPysrDODg&sa=X&oi=image_result&resnum=2&ct=image

Tesla GPU Computing Products

Tesla S1070

1U System

Tesla C1060

Computing Board

GPUs 4 Tesla GPUs 1 Tesla GPU

Single Precision

Performance4.14 Teraflops 933 Gigaflops

Double Precision

Performance346 Gigaflops 78 Gigaflops

Memory 16 GB (4 GB / GPU) 4 GB

M$

Performance

100x

1x

10,000x

TraditionalCPU Cluster

CPU Workstation

K$

TeslaPersonal

Supercomputer

Tesla Co-processing

Cluster

New Class of Co-Processing Supercomputers

2 TeslaM1060 GPUs

Up to 18 Tesla M1060 GPUs

Bull Bullx

Blade Enclosure

SuperMicro 1U

GPU Server

Finance: Equity Pricing

2 Tesla S1070 500 CPU Servers

2.8 kWatts 37.5 kWatts

$24 K $250 K

16x Less Space

13x Lower Power

10x Lower Cost

Equal Performance1 1

Oil & Gas: Seismic Processing

~$400 K ~$8 M

45 kWatts 1,200 kWatts27x Lower Power

20x Lower Cost

Equal Performance1 1

32 Tesla S1070 2,000 CPU Servers31x Less Space

192 TFlops GPU

256 TFlops GPU

CEA-DAM CCRT

TOTAL Seismic Processing

Tesla: Helping solve the critical HPC challenges

GPU

T8

128 core

T10

240 core

A 2015 GPU *

~20x the performance of today’s GPU

~5,000 cores at ~3GHz (50mW each)

~20 TFLOPS

~1.2TB/s of memory bandwidth

* This is a sketch of a what a GPU in 2015 might look like, it does not reflect any actual product plans

GPU Revolutionizing Computing

GFlops

GPU Technology Conference

Sept 30 – Oct 2, 2009

San Jose, CA

www.nvidia.com/gtc

We bring Solutions to your Questions

http://www.nvidia.com/gtc

Thank You!

NVIDIA GPU Computing Links

NVIDIA CUDA Zone

NVIDIA High Performance Computing Solutions

NVIDIA Tesla S1070 – Product Description

NVIDIA Tesla C1060 – Product Description

Tesla Personal Supercomputer

Tesla Personal Supercomputer – Where to Buy?

YouTube – Tesla videos

Jean-Christophe Baratault

EMEA GPU Computing Sales

[email protected]

Cell +33 6 8036 8483

http://www.nvidia.com/object/cuda_home.html

http://www.nvidia.com/object/tesla_computing_solutions.html

http://www.nvidia.com/object/tesla_computing_solutions.html

http://www.nvidia.com/object/product_tesla_s1070_us.html




http://www.nvidia.com/object/product_tesla_c1060_us.html





http://www.nvidia.com/object/personal_supercomputing.html

http://www.nvidia.com/object/tesla_supercomputer_wtb.html




http://www.youtube.com/nvidiatesla







mailto:[email protected]

Documents

Tesla: Fastest Processor Adoption in HPC History · 10/2/2009 · CUDA Ecosystem Applications Libraries FFT BLAS LAPACK Image processing Video processing Signal processing Vision