Upload
skillspeed
View
263
Download
5
Embed Size (px)
Citation preview
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Predicting ConsumerBehaviour via Hadoop
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Session Objectives
In this session you will understand
ᗍ Big Data and Hadoopᗍ HDFSᗍ MapReduce with examples and Scenariosᗍ Predictive Analytics and its processᗍ Three Pillars of Predictive Analyticsᗍ Applications of Predictive Analytics
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applicationsSystems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information
It’s very difficult to manage such huge data……
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Who Generates Big Data?
Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?Today, it is becoming a problem for all of us to manage such BIG DATA….
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop and its CharacteristicsApache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model
It is an Open-source Data Management technology with scale-out storage and distributed processing
Hadoop CharacteristicsFlexible
Reliable
Economical
Scalable
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop Ecosystem
Flume Sqoop
Import Or Export
Unstructured or Semi-Structured data Structured Data
Apache Oozie (Workflow)
HDFS(Hadoop Distributed File System)
Pig LatinData Analysis
HiveDW System
MapReduce Framework HBase
Other YARN
Frameworks (MPI, GIRAPH)
YARNCluster Resource Management
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Data(Sources, Types, Forms)
Capture Predict
• Data Mining• Text Mining• Statistical Analytics
Act
Act on the model
Predictive Analysis
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why Predictive Analytics?
ᗍ Predictive analytics automatically synthesizes big data, mathematical sciences, business rules, and machine learning to make predictions and then suggests decision options to take advantage of a future opportunity
ᗍ The purpose of predictive analytics is to tell you what will happen in the future
ᗍ Predictive Analytics is branch of the Data Mining process
ᗍ An example of using predictive analytics is optimizing customer relationship management systems
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Monitor Progress
Implement Results
Draw Conclusions
Run Analysis
Check the data fits the tool
Draw Hypothesis
Implement Results
Extract data needed
Predictive Analytics – Process
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Three Pillars of Predictive Analytics
Predictive Operational Analyticsᗍ Plan ᗍ Manageᗍ Maximize
Predictive Threat and Fraud Analyticsᗍ Monitor ᗍ Detect ᗍ Control
Predictive Customer Analyticsᗍ Acquire ᗍ Grow ᗍ Retain
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Most Common Predictive Modelling Tasks
ᗍ Classificationᗍ Clusteringᗍ Associationᗍ Detectionᗍ Estimation and Time Seriesᗍ Link Analysisᗍ Web and Text Mining
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Applications of Predictive Analytics ᗍ Analytical customer relationship management (CRM)ᗍ Clinical decision support systemsᗍ Customer retentionᗍ Direct marketingᗍ Fraud detectionᗍ Risk managementᗍ Underwriting
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
What is Predictive Analytics all about?
Predictive analytics is really about solving problems with data
Predictive Analytics is the technology that learns from experience(data) to predict the future behaviour of individuals in order to drive better decisions
Predictive Analytics helps to connect data to effective action by drawing reliable conclusions about current conditions and future events
Enables businesses to use predictive models to exploit patterns found in historical data to identify potential risks and opportunities before they occur
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Map Reduce – Scenario
Let us consider a real life scenario to understand the importance of “Map Reduce” in Hadoop
Suppose, you are the handling a project which
has x tasks and takes 100 hours for one resource to
complete
1 x 100 = 100 hours
100/10(resources) = 10 hours
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Similarly,
= 100 hours 100/10 = 10 hours
Map Reduce – Scenario
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
More Scenarios on Map-Reduce
Problem Statement:Find maximum stock market levels recorded in a span of 5 years
Problem Statement:De-identify personal identifier information
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Traditional Solution
matchesSplit Data
VeryBig
Data
Allmatches
grep
grep
grep
cat
grep
:
matches
matches
matches
Split Data
Split Data
Split Data
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Solution
VeryBig
Input
Split Data
Allmatches
:
Split Data
Split Data
Split Data
MAP
REDUCE
MapReduce Framework
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Advantages
Two biggest advantages:
ᗍ Takes processing to the dataᗍ Allows processing data in
parallela b
c
Map TaskHDFS Block
Data Center
Rack
Node
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Flow
1. Input data is present in data nodes2. Map tasks = Input Splits3. Mappers produce intermediate data4. Data exchanged among nodes in “shuffling”5. All data of same key goes to same reducer6. Reducer output stored at output location
Node 1
INPUT DATA
Map
Node 2
Map
Node 1
Reduce
Node 1
Reduce
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Job Trends – Hadoop
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Course Topics
Module 1Introduction to Big Data and Hadoop
Module 2HDFS Internals,
Hadoop Configurations and Data Loading
Module 3Introduction to Map
Reduce
Module 4Advanced Map Reduce
Concepts
Module 5Introduction to Pig
Module 6Advanced Pig and
Introduction to Hive
Module 7Advanced Hive
Concepts
Module 8Extending Hive and HBase Introduction
Module 9Advanced HBase and
Oozie Introduction
Module 10Project Set-up
Discussion
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why SkillSpeed?
Course Curriculum
from Industry Experts
Instructor Led Live Virtual Sessions
Lifetime access to Course
Content via LMS
100% Placement Assistance
24x7 Support
24x7
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Corporate Partners
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Lines open 24/7
To know more about the course, Please contact:
IND+91-90660-20904 USA1866-607-6547 (Toll Free)
Or reach us [email protected]
Contact us..
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Image References
Images Credits: Google, Facebook and LinkedIn LOGO and Snapshots
http://findicons.com/icon/66444/user_grouphttp://www.virtualizor.com/tour
https://accounts.it.et.byu.edu/
http://www.clipartsfree.net/tag/server.html
http://www.gopixpic.com/16/time-clock-icon-png-download
http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/
http://www.lincs.fr/research/areas/big-data/
http://www.counsellingpages.co.uk/
http://langfordsconsultancy.com/langfords-training-support-package/
http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html
http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010