Upload
edureka
View
470
Download
0
Embed Size (px)
Citation preview
http://www.edureka.co/apache-spark-scala-training
Spark will replace Hadoop ! Know Why ?
Slide 2Slide 2Slide 2 http://www.edureka.co/apache-spark-scala-training
At the end of the session, you will be able to:
Understand Why Learn Spark?
Know Advantages of Spark & its Survey for 2015
Discover Spark Career Path
Understand how Companies are using Spark?
Agenda
Slide 3Slide 3Slide 3 http://www.edureka.co/apache-spark-scala-training
Why Spark?
Slide 4Slide 4Slide 4 http://www.edureka.co/apache-spark-scala-training
Rise of Big Data
By 2020, IDC (International Data Corporation) predicts the number will have reached 40,000 EB, or 40 Zettabytes (ZB)
The world’s information is doubling every two years. By 2020, there will be 5,200 GB of data for every person on
Earth.
0
1000
2000
3000
4000
5000
6000
7000
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Unstructured Data
Structured Data Un-structured Data
Slide 5Slide 5Slide 5 http://www.edureka.co/apache-spark-scala-training
Application of Big Data
Source: Twitter
Slide 6Slide 6Slide 6 http://www.edureka.co/apache-spark-scala-training
Application of Big Data
Slide 7Slide 7Slide 7 http://www.edureka.co/apache-spark-scala-training
Hadoop is not Enough!
Limitations:
Conclusion:
Real-time Processing
Not Fast Enough
Hadoop MapReduce is Limited to Batch Processing. Real-time processing was a big “No” in Hadoop
Hadoop MapReduce is fast but not fast enough
It is essential and can be achieved using Spark!
Slide 8Slide 8Slide 8 http://www.edureka.co/apache-spark-scala-training
Spark Survey and its Advantages
Slide 9Slide 9Slide 9 http://www.edureka.co/apache-spark-scala-training
Spark Survey 2015!
Source: Typesafe
Slide 10Slide 10Slide 10 http://www.edureka.co/apache-spark-scala-training
Advantages of Spark
Ease of Use
Generality
Runs Everywhere
100x faster than MR
Slide 11Slide 11Slide 11 http://www.edureka.co/apache-spark-scala-training
Feature Comparision
Fast 100x faster than MapReduce
Batch Processing Batch and Real-time Processing
Stores Data on Disk Stores Data in Memory
OpenSource OpenSource
Written in Java Written in Scala
Hadoop MapReduce HADOOP Spark
Source: Databrix
Slide 12Slide 12Slide 12 http://www.edureka.co/apache-spark-scala-training
Spark Features/Modules in Demand
Source: Typesafe
Slide 13Slide 13Slide 13 http://www.edureka.co/apache-spark-scala-training
New Features in 2015
Data Frames
• Similar API to data frames in R and Pandas• Automatically optimised via Spark SQL• Released in Spark 1.3
SparkR
• Released in Spark 1.4• Exposes DataFrames, RDD’s & ML library in R
Machine Learning Pipelines
• High Level API• Featurization• Evaluation • Model Tuning
External Data Sources
• Platform API to plug Data-Sources into Spark• Pushes logic into sources
Source: Databrix
Slide 14Slide 14Slide 14 http://www.edureka.co/apache-spark-scala-training
Spark Career Path
Slide 15Slide 15Slide 15 http://www.edureka.co/apache-spark-scala-training
Job Roles & Industry Focus
Source: Typesafe
Slide 16Slide 16Slide 16 http://www.edureka.co/apache-spark-scala-training
Salary Trends
Slide 17Slide 17Slide 17 http://www.edureka.co/apache-spark-scala-training
Major Companies Using Hadoop
Slide 18Slide 18Slide 18 http://www.edureka.co/apache-spark-scala-training
Industry Adoption
Source: Typesafe
Slide 19Slide 19Slide 19 http://www.edureka.co/apache-spark-scala-training
How Companies are using Spark?
Slide 20Slide 20Slide 20 http://www.edureka.co/apache-spark-scala-training
General Business Goals
Source: Typesafe
http://www.edureka.co/apache-spark-scala-training
Demo
Slide 22Slide 22Slide 22 http://www.edureka.co/apache-spark-scala-training
The Big Question!
Is Spark going to replace Hadoop?
Slide 23Slide 23Slide 23 http://www.edureka.co/apache-spark-scala-training
The Big Question!
Is Spark going to replace Hadoop?
Answer – Yes, Spark will be used on top of Hadoop and replace MapReduce
Reasons:
1. Hadoop MapReduce cannot handle real-time processing 2. Hadoop MapReduce is slower than Hadoop Spark3. With rise of IOT, Spark is a must
Questions
Slide 24 http://www.edureka.co/apache-spark-scala-training
Slide 25
Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make the course better!
Please spare few minutes to take the survey after the webinar.
http://www.edureka.co/apache-spark-scala-training
Survey