Upload
edureka
View
1.001
Download
0
Embed Size (px)
Citation preview
www.edureka.co/apache-spark-scala-training
What will you learn today ?
Beyond Hadoop MapReduce
How Spark is better than MapReduce?
Benchmark : Spark vs MapReduce
Hands-On : Analyzing data with Spark
www.edureka.co/apache-spark-scala-training
Word Count Problem - MapReduce
MapReduce Code for a Simple Word Count Problem
www.edureka.co/apache-spark-scala-training
Apache Spark
Apache Spark is a general purpose data processing engine with in-memory computing
Spark provides API for Scala, Java, Python and R which makes Spark widely adopted for data processing
www.edureka.co/apache-spark-scala-training
How Spark fits into Hadoop Ecosystem ?
Spark is intended to enhance, not replace, the Hadoop stack
Spark is designed to read and write data to HDFS as well as other storage systems such as CSV files, Amazon S3 and NoSQL databases
www.edureka.co/apache-spark-scala-training
Word Count Problem - Spark
Spark Scala Code for Word Count Problem
Spark Python Code for Word Count Problem
Clearly processing data with Spark is much easier than MapReduce and Spark gives you the flexibility to choose your favorite language Scala, Java, Python etc.
www.edureka.co/apache-spark-scala-training
Why Spark for Big Data Analytics ?
What makes Spark
suitable for Big Data
Analytics ?
www.edureka.co/apache-spark-scala-training
Why Spark for Big Data Analytics ?
Following features make Spark, the best fit for Big Data Analytics :
Spark simplifies data analysis
Spark provides built-in libraries to do advanced analytics
Spark speaks more than one language
Spark provides faster results
Spark allows you to use different Hadoop vendors
www.edureka.co/apache-spark-scala-training
Isn’t Spark In-Memory Only
But I have heard Spark is good for onlyin-memory processing?
www.edureka.co/apache-spark-scala-training
Spark : Best of both Worlds
It’s a common misconception Spark is only for in-memory processing. From its inception Spark was designed to be a general execution engine that works both in-memory and on-disk. Almost all Spark operators perform external operations when data does not fit in memory
www.edureka.co/apache-spark-scala-training
Spark Libraries
Spark SQL : Spark’s module for working with structured data
MLlib : Spark’s machine learning library
GraphX : Spark’s API for graph computation
Spark Streaming : Spark’s API to process streaming data
www.edureka.co/apache-spark-scala-training
Spark Use Cases
Different companies are using Spark for solving various problems e.g. recommendation systems, business intelligence, fraud detection etc.
www.edureka.co/apache-spark-scala-training
Who is using Spark?
A complete list of companies using Spark can be found here : https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark
www.edureka.co/apache-spark-scala-training
Spark is here to stay
Spark is not one of those "here today, gone tomorrow". Spark is here to stay for the foreseeable future, and it is well worth to get your teeth into it in order to get value out of your data
www.edureka.co/apache-spark-scala-training
References
IBM backs Apache Spark for Big Data Analytics :
http://www.forbes.com/sites/paulmiller/2015/06/15/ibm-backs-apache-spark-for-big-data-analytics/
Why Cloudera is saying 'Goodbye, MapReduce' and 'Hello, Spark' :
http://fortune.com/2015/09/09/cloudera-spark-mapreduce/
5 reasons to turn to Spark for Big Data Analytics :
http://www.infoworld.com/article/2897287/big-data/5-reasons-to-turn-to-spark-for-big-data-analytics.html
www.edureka.co/apache-spark-scala-training
References
Spark new record for large scale sorting :
https://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html
How eBay uses Spark to ignite Data Analytics :
http://www.ebaytechblog.com/2014/05/28/using-spark-to-ignite-data-analytics/
Spark is fast on disk too :
https://gigaom.com/2014/10/10/databricks-demolishes-big-data-benchmark-to-prove-spark-is-fast-on-disk-too/
www.edureka.co/apache-spark-scala-training
Survey
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!
Please spare few minutes to take the survey after the webinar.