How to Actually Tune Your Spark Jobs So They Work

USF Spark Workshop

Ilya Ganelin

Overview• Goal:

• Understand how Spark internals drive design and configuration

• Contents:

• Background• Partitions

• Caching

• Serialization

• Shuffle

• Lessons 1-4

• Experimentation, debugging, exploration

• ASK QUESTIONS.

Background

Partitions, Caching, and Serialization

• Partitions• How data is split on disk

• Affects memory / CPU usage and shuffle size

• Caching• Persist RDDs in distributed memory

• Major speedup for repeated operations

• Serialization• Efficient movement of data

• Java vs. Kryo

• Spark Architecture / Workflow

Shuffle?

Shuffle!• All-all operations

• reduceByKey, groupByKey

• Data movement• Serialization

• Akka

• Memory overhead• Dumps to disk when OOM

• Garbage collection

• EXPENSIVE!

Map Reduce

Spark Architecture

Lessons

Lesson 1: Spark is a problem child!• Memory

• You’re using more than you think• JVM garbage collection

• Spark metadata (shuffles, long-running jobs)

• Scala & Java type overhead

• Shuffle / heap / YARN

• Debugging is hard• Distributed logs

• Hundreds of tasks

Lesson 1: Discipline• Tame the beast (memory)

• Partition wisely

• Know your data!• Size, Types, Distribution

• Kryo Serialization

• Cleanup• Long-term jobs consume memory indefinitely

• Spark context cleanup fails in production environment

• Solution: YARN!• Separate spark-submits per batch

• Stable Spark-based job that runs for weeks

Spark Memory Structure

spark.executor.memory - parameter that defines

the total amount of memory available for the executor.

spark.storage.memoryFraction – This defines

the fraction (by default 0.6) of the total memory to use for

storing persisted RDDs.

spark.shuffle.memoryFraction – This defines

the fraction of memory to reserve for shuffle (by default 0.2)

Typically don’t touch:

spark.storage.unrollFraction

spark.storage.safetyFraction. T

These are defined primarily for certain internal

constructs and size-estimation. These default to 20% and 10% respectively.

• yarn.nodemanager.resource.memory-mb - controls the maximum sum of memory used by the containers on each node.• --executor-memory/spark.executor.memory controls the executor heap size,

• JVMs can use memory off heap, (interned Strings and direct byte buffers). • spark.yarn.executor.memoryOverhead + executor memory determine memory request to YARN for each executor. It defaults to max(384, .07 * spark.executor.memory).

• YARN may round the requested memory up a little. • yarn.scheduler.minimum-allocation-mb• yarn.scheduler.increment-allocation-mb

Lesson 2: Avoid shuffles!• Why?

• Speed up execution

• Increase stability

• ????

• Profit!

• How?• Custom partitioning

• Use the driver!• Collect

• Broadcast

Lesson 3: Using the driver is hard!• Limited memory

• Collected RDDs

• Metadata

• Results (Accumulators)

• Akka messaging• 106 x (120 bytes) ~ 1.2GB; 20 partitions

• Read ~60 MB per partition – (Default is 10MB)

• Solution: Partition & set akka.frameSize - know your data!

• Big data• Solution: Batch process

• Problem: Cleanup and long-term stability

Lesson 4: Speed!• Cache, but cache wisely

• If you use it twice, cache it

• Broadcast variables• Visible to all executors

• Only serialized once

• Blazing-fast lookups!

• Threading• Thread pool on driver

• Fast operations, many tasks

• 75x speedup over ML Lib ALS predict()• Start: 1 rec / 1.5 seconds

• End: 50 recs / second

screen pyspark--driver-memory 100g \--num-executors 60 \--executor-cores 5 \--master yarn-client \--conf "spark.executor.memory=20g” \--conf "spark.io.compression.codec=lz4" \--conf "spark.shuffle.consolidateFiles=true" \--conf "spark.dynamicAllocation.enabled=false" \--conf "spark.shuffle.manager=tungsten-sort" \--conf "spark.akka.frameSize=1028" \--conf "spark.executor.extraJavaOptions=-Xss256k -XX:MaxPermSize=128m -XX:PermSize=96m -XX:MaxTenuringThreshold=2 -XX:SurvivorRatio=6 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC \-XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AggressiveOpts -XX:+UseCompressedOops"

Questions?

References• http://spark.apache.org/docs/latest/programming-guide.html

• http://spark.apache.org/docs/latest/sql-programming-guide.html

• http://tinyurl.com/leqek2d (Working With Spark, by Ilya Ganelin)

• http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/ (by Sandy Ryza)

• http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ (by Sandy Ryza)

• http://www.slideshare.net/ilganeli/frustrationreduced-pyspark-data-engineering-with-dataframes

• http://www.amazon.com/Spark-Data-Cluster-Computing-Production/dp/1119254019

http://spark.apache.org/docs/latest/sql-programming-guide.html

http://spark.apache.org/docs/latest/sql-programming-guide.html

http://tinyurl.com/leqek2d

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

http://www.slideshare.net/ilganeli/frustrationreduced-pyspark-data-engineering-with-dataframes

Engineering

How to Actually Tune Your Spark Jobs So They Work