44
Jason Lowe and Robert Evans, 05/19/2020 UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL APPLICATIONS

UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

Jason Lowe and Robert Evans, 05/19/2020

UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL APPLICATIONS

Page 2: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

2

Accelerated ETL

Accelerated SQL/Dataframe

Accelerated Shuffle

What's Next

AGENDA

Page 3: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

3

ACCELERATED ETL?

https://www.piqsels.com/en/public-domain-photo-zrkia

Can a GPU make an elephant fast?

Page 4: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

4

YESTPCx—BB Like Benchmark Results (10TB Dataset, Two Nodes DGX-2 Cluster)*

Environment: Two DGX-2 (96 CPU Cores, 1.5TB Host memory, 16 V100 GPUs, 512 GB GPU Memory)

* Not official or complete TPCx-BB runs (ETL power only).

Query #5 Query #16 Query #21 Query #22

CPU 25.95 6.16 7.13 3.80

GPU 1.31 1.16 0.56 0.14

0.00

5.00

10.00

15.00

20.00

25.00

30.00Ti

me

(min

s)Query Time: GPU vs CPU (Mins)

Page 5: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

5

MODERN ML/DL WORKFLOW

Load Transform

Data

Sources Data

Store

Ingest

CPU Compute

Model

Training

GPU Compute

Training

Page 6: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

6

APACHE SPARK 2.X

CLUSTER MANAGEMENT/DEPLOYMENT (YARN, K8S, Standalone)

DISTRIBUTED, SCALE-OUT DATA SCIENCE AND AI APPLICATIONS

CPU Infrastructure

ACCELERATED ML/DL FRAMEWORKS

XGBoost TensorFlow

PyTorch Horovod

SPARK 2.x CORE

APACHE SPARK COMPONENTS

Spark

SQL/DFGraphX

Streaming MLlib

Page 7: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

7

SPARK 3.X IS A UNIFIED AI PLATFORM

END-TO-END APACHE SPARK 3.0 PIPELINE

CLUSTER MANAGEMENT/DEPLOYMENT (YARN, K8S, Standalone)

DISTRIBUTED, SCALE-OUT DATA SCIENCE AND AI APPLICATIONS

GPU-Accelerated Infrastructure

ACCELERATED ML/DL FRAMEWORKS

XGBoost TensorFlow

PyTorch Horovod

SPARK 3.0 CORE

APACHE SPARK COMPONENTS

Spark SQL/DF GraphX

Streaming MLlib

RAPIDS Accelerator for Apache Spark

Page 8: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

8

ETL + ML/DL WORKFLOW

Load Transform

Data

Sources Data

Store

Ingest

Model

Training

GPU Compute

Page 9: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

9

DEEP LEARNING RECOMMENDATION MACHINES

Anonymized 7-day clickstream dataset (1 TB)

Convert high cardinality string categorical data to contiguous integer ids

DLRM github repo has scripts for this out of the box

https://medium.com/analytics-vidhya/deep-learning-recommendation-machines-dlrm-4fec2a5e7ef8

Example use case: Criteo dataset

Page 10: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

10

DLRM ON CRITEO DATASET (PAST)

* Extrapolated couldn’t convince anyone to wait that long

144.0

12.1

45.0

0.70.0

20.0

40.0

60.0

80.0

100.0

120.0

140.0

ETL (1 core CPU)* Spark ETL (96 core CPU) Training (96 core CPU) Training (1 - V100)

Tim

e (

Ho

urs

)

ETL & Training Run Time for CPU & GPU CRITEO DATASET (1TB)

Page 11: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

11

DLRM ETL ON CRITEO DATASET (PRESENT)

12.1

2.3

0.50.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

Spark ETL (96 core CPU) Spark ETL (1 - V100) Spark ETL (8 - V100)

Tim

e (H

ou

rs)

Spark ETL for CRITEO DATASET (1TB)

Page 12: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

12

DLRM END-TO-END ON CRITEO DATASET (PRESENT)

Original CPU (1 Core forETL, 96 Core CPU for

Training)

Spark CPU (96 Core for ETL& Training)

Spark CPU (96 Core for ETL)& Spark GPU (1-V100

Training)

Spark GPU (8-V100 for ETL& 1-V100 Training)

Training 45.0 45.0 0.7 0.7

ETL 144.0 12.1 12.1 0.5

144.0

12.1 12.10.5

45.0

45.00.7

0.7

0.0

20.0

40.0

60.0

80.0

100.0

120.0

140.0

160.0

180.0

Tim

e (H

ou

rs)

Spark ETL + Training for Criteo Dataset (1TB)

160x faster than original48x faster than CPU (4% the cost)

10x faster than typical (1/6th the cost)

Page 13: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

13

“The more you buy, the more you

save.”

— Jensen Huang, GTC 2018

Page 14: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

14

RAPIDS ACCELERATOR FOR APACHE SPARK (PLUGIN)

UCX Libraries RAPIDS C++ Libraries

CUDA

JNI bindings

Mapping From Java/Scala to C++

RAPIDS Accelerator

for Spark

DISTRIBUTED SCALE-OUT SPARK APPLICATIONS

Spark SQL API Spark ShuffleDataFrame API

if gpu_enabled(operation, data_type)

call-out to RAPIDS

else

execute standard Spark operation

JNI bindings

Mapping From Java/Scala to C++

● Custom Implementation of Spark

Shuffle

● Optimized to use RDMA and GPU-to-

GPU direct communication

APACHE SPARK CORE

Page 15: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

15

SQL/DATAFRAME PLUGIN

Page 16: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

16

No Code Changes

Same SQL and Dataframe code.

(none)

Page 17: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

17

WHAT WE SUPPORT

!

%

&

*

+

-

/

<

<=

<=>

=

==

>

>=

^

abs

acos

and

asin

atan

avg

bigint

boolean

cast

cbrt

ceil

ceiling

coalesce

concat

cos

cosh

cot

count

cube

current_date

current_timestamp

date

datediff

day

dayofmonth

degrees

double

e

exp

expm1

first

first_value

float

floor

from_unixtime

hour

if

ifnull

in

initcap

input_file_block_length

input_file_block_start

input_file_name

int

isnan

isnotnull

isnull

last

last_value

lcase

like

ln

locate

log

log10

log1p

log2

lower

max

mean

min

minute

mod

monotonically_increasing_id

month

nanvl

negative

not

now

nullif

nvl

nvl2

or

pi

posexplode*

position

pow

power

radians

rand*

regexp_replace*

replace

rint

rollup

row_number

second

shiftleft

shiftright

shiftrightunsigned

sign

signum

sin

sinh

smallint

spark_partition_id

sqrt

string

substr

substring

sum

tan

tanh

timestamp

tinyint

trim

ucase

upper

when

window

year

|

~

CSV Reading*

Orc Reading

Orc Writing

Parquet Reading

Parquet Writing

ANSI casts

TimeSub for time ranges

startswith

endswith

contains

limit

order by

group by

filter

union

repartition

equi-joins

select

and growing…

Page 18: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

18

IS THIS A SILVER BULLET?

Small amounts of data

Few hundred MB per partition for GPU

Highly cache coherent processing

Data Movement

Slow I/O (networking, disks, etc.)

Going back and forth to the CPU (UDFs)

Shuffle

Limited GPU Memory

NO

160

550

1250

3500

12288

24576 2560046080

307200

1048576

MB/s

(Log S

cale

)

Page 19: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

19

BUT IT CAN BE AMAZING

High cardinality joins

High cardinality aggregates

High cardinality sort

Window operations (especially on large windows)

Complicated processing

Transcoding (Writing Parquet and ORC is hard, reading CSV is hard)

What the SQL plugin excels at

Page 20: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

20

HOW DOES IT WORK

Page 21: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

21

SPARK SQL & DATAFRAME COMPILATION FLOW

DataFrame

Logical Plan

Physical Plan

RDD[InternalRow]

bar.groupBy(col(”product_id”),col(“ds”))

.agg(max(col(“price”)) -

min(col(“price”)).alias(“range”))

SELECT product_id, ds, max(price) – min(price) AS range FROM bar GROUP BY product_id, ds

QUERY

CPU

PH

YSIC

AL P

LAN

Page 22: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

22

SPARK SQL & DATAFRAME COMPILATION FLOW

DataFrame

Logical Plan

Physical Plan

RDD[InternalRow]

bar.groupBy(col(”product_id”),col(“ds”))

.agg(max(col(“price”)) -

min(col(“price”)).alias(“range”))

SELECT product_id, ds, max(price) – min(price) AS range FROM bar GROUP BY product_id, ds

QUERY

GPU

PH

YSIC

AL P

LAN

GPU Physical Plan

RDD[ColumnarBatch]

RAPIDS SQLPlugin

Page 23: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

23

SPARK SQL & DATAFRAME COMPILATION FLOWG

PU

PH

YSIC

AL P

LAN

CPU

PH

YSIC

AL P

LAN

Read Parquet File

First Stage

Aggregate

Shuffle Exchange

Second Stage

Aggregate

Write Parquet File

Combine Shuffle

Data

Read Parquet File

First Stage

Aggregate

Shuffle Exchange

Second Stage

Aggregate

Write Parquet File

Convert to Row

Format

Convert to Row

Format

Page 24: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

24

ETL TECHNOLOGY STACK

Dask cuDFcuDF, Pandas

Python

Cython

cuDF C++

CUDA Libraries

CUDA

Java

JNI bindings

Spark dataframes, Scala, PySpark

Page 25: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

25

ACCELERATED SHUFFLE

Page 26: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

26

SPARK SHUFFLEData Exchange Between Stages

Task 1Task 0 Task 2

Task 1Task 0

Stage 1

Stage 2

Page 27: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

27

SPARK SHUFFLECPU-Centric Data Movement

PCI-e Bus

Local

StorageNetworkGPU 1

CPU

GPU 0

Page 28: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

28

ACCELERATED SPARK SHUFFLEGPU-Centric Data Movement

PCI-e Bus

Local

StorageNetworkGPU 1

CPU

GPU 0

NVLink

GPU DirectStorage

RDMA

Page 29: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

29

ACCELERATED SPARK SHUFFLEShuffling Spilled Data

PCI-e Bus

Local

StorageNetworkGPU 1

CPU

GPU 0

RDMA

Host

Memory

Page 30: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

30

UCX LIBRARY

Abstracts communication transports

Selects best available route(s) between endpoints

TCP, RDMA, Shared Memory, GPU

Zero-copy GPU memory transfers over RDMA

RDMA requires network support (IB or RoCE)

http://openucx.org

Unified Communication X

Page 31: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

31

ACCELERATED SHUFFLE RESULTSInventory Pricing Query

CPU GPU GPU+UCX

Series1 228 45 8.4

0

50

100

150

200

250

Qu

ery

Du

rati

on

in S

eco

nd

s

Page 32: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

32

ACCELERATED SHUFFLE RESULTSETL for Logistical Regression Model

CPU GPU GPU+UCX

Series1 1556 172 79

0

200

400

600

800

1000

1200

1400

1600

1800

Qu

ery

Du

rati

on

in S

eco

nd

s

Page 33: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

33

WHAT’S NEXT?

Page 34: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

34

WHAT’S NEXT

Open Source/Spark 3.0 Release

Nested types Arrays, Structs, and Maps

Decimal type

More operators

GPU Direct Storage

Time zone support for timestamps(only UTC for now)

Higher order functions

UDFs

COMING SOON

FURTHER OUT

Page 35: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

35

WHERE TO GET MORE INFOLearn more about the RAPIDS Accelerator for Apache Spark

Visit: NVIDIA.com/Spark

Please use the “contact us” to get in touch with NVIDIA’s Spark team

Listen to how Adobe Email Marketing Intelligent Services leverages the RAPIDS Accelerator & Spark 3.0 on Databricks

Upcoming Spark+AI Summit Sessions on GPU support for Apache Spark 3.0:

Deep Dive into GPU Support in Apache Spark 3.x

Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters

Preview of Spark 3.0 GPU Features: NVIDIA.com/Spark-Book

QUESTIONS

Page 36: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT
Page 37: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

37

BACKUP SLIDES

Page 38: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

38

FAQS

Q: What are the minimum requirements?

A: The RAPIDS accelerator requires:

Apache Spark 3.0

RAPIDS cudf 0.14

CUDA 10.1 or later

NVIDIA GPU with Pascal architecture or later

Ubuntu 16.04+ or CentOS 7+

Page 39: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

39

FAQS

Q: Do all cluster nodes require GPUs?

A: All Spark executors running with the RAPIDS accelerator require their own GPU.

The Spark driver process does not require a node with a GPU.

Q: Can I run more than one executor per GPU?

A: No, there must be a one-to-one mapping between Spark executors and GPUs.

You can run more than one concurrent task per executor.

Page 40: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

40

FAQS

Q: Will the RAPIDS accelerator work in the cloud?

A: Yes, if the VM environment meets the minimum requirements.

Q: Will the RAPIDS accelerator be available for Apache Spark 2.x?

A: No. The columnar processing APIs added in Apache Spark 3.0 are required.

Q: How can I tell if an operation is being accelerated?

A: Accelerated operations appear in the query explanation and SQL UI.

Page 41: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

41

RAPIDS ACCELERATOR CONFIGURATION

spark.rapids.sql.enabled is the master enable

spark.rapids.sql.explain enables logging of operations not accelerated

spark.rapids.sql.concurrentGpuTasks controls concurrent task count per GPU

Page 42: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

42

SPARK ACCELERATOR-AWARE SCHEDULING

Tracking JIRA: SPARK-24615

Request executor and driver resources (GPU, FPGA, etc.)

Resource discovery

Specify task resources

API to determine assigned resources

YARN, Kubernetes, and Standalone

Page 43: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

43

SPARK ACCELERATOR-AWARE SCHEDULING

./bin/spark-shell --master yarn --executor-cores 2 \

--conf spark.driver.resource.gpu.amount=1 \

--conf spark.driver.resource.gpu.discoveryScript=/opt/spark/getGpuResources.sh \

--conf spark.executor.resource.gpu.amount=2 \

--conf spark.executor.resource.gpu.discoveryScript=./getGpuResources.sh \

--conf spark.task.resource.gpu.amount=1 \

--files examples/src/main/scripts/getGpusResources.sh

Sample Command-Line

Page 44: UTILIZING ACCELERATORS TO SPEEDUP ETL, ML, AND DL … · Spark SQL/DF GraphX Streaming MLlib. 7 SPARK 3.X IS A UNIFIED AI PLATFORM END-TO-END APACHE SPARK 3.0 PIPELINE CLUSTER MANAGEMENT/DEPLOYMENT

44

SPARK STAGE LEVEL SCHEDULING

Tracking JIRA: SPARK-27495

Specify task resource requirements per RDD operation

Dynamically allocates containers to meet resource requirements

Schedules tasks on appropriate containers