Upload
skillspeed
View
194
Download
5
Embed Size (px)
Slide 1© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data Analytics using Pig
Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Scope of PPT – BIG Data Analytics via PIG
ᗍ Introduction to Big Data and Hadoop
ᗍ Introduction to Pig
ᗍ Hadoop Pig Architecture
ᗍ BIG Data Analytics via Pig
ᗍ BIG Data & Hadoop Job Trends
ᗍ BIG Data & Hadoop Course Syllabus
Get Started with BIG Data & Hadoop
Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Get Started with BIG Data & Hadoop
Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications
Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information
It’s very difficult to manage such huge data……
Get Started with BIG Data & Hadoop
Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Who Generates Big Data?
Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?Today, it is becoming a problem for all of us to manage such BIG DATA….Get Started with BIG Data & Hadoop
Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop can be used for easy processing of such huge Data…..We will answer how?
Before that let’s understand what is Hadoop? Get Started with BIG Data & Hadoop
Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop and its Characteristics
Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model
It is an Open-source Data Management technology with scale-out storage and distributed processing
Hadoop Characteristi
cs
Flexible
Reliable
Economical
Scalable Get Started with BIG Data & Hadoop
Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Flume Sqoop
Import Or Export
Unstructured or Semi-Structured data Structured Data
Apache Oozie (Workflow)
HDFS(Hadoop Distributed File System)
Pig LatinData Analysis
HiveDW System
MapReduce Framework HBase
Other YARN
Frameworks (MPI, GIRAPH)
YARNCluster Resource Management
Hadoop Ecosystem
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Need for Pig
Java is not a preferred language for many data
analysts
200 Java LOC ~ 10 Pig LOC
Many built-in operations are available for common data operations like join, grouping, filtering etc.
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Where to use Pig?
Pig is a Data Flow language, thus it is most suitable for:
ᗍ Quickly changing data processing requirementsᗍ Processing data from multiple channelsᗍ Quick hypothesis testingᗍ Time sensitive data refreshesᗍ Data profiling using sampling
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
What is Pig?
ᗍ It is an open source data flow language
ᗍ Pig Latin is used to express the queries and data manipulation operations in simple
scripts
ᗍ Pig converts the scripts into a sequence of underlying Map Reduce jobs
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Let’s internalize Pig
Let’s find out people who “overall” visit “highly ranked” pages
User URL Time
John www.cbn.com 7:00
John www.trap.com 7:05
John www.myblog.com 9:00
John www.flickr.com 9:05
Linda cnn.com/index.htm 11:00
Visits
Page URL Page Rank
www.cbn.com 0.9
www.flickr.com 0.9
www.myblog.com
0.6
www.trap.com 0.3
Pages
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Internalizing Pig
Joinurl = url
LoadVisits (user, url, time)
LoadPages (url, pagerank)
Group by User
Compute AveragePagerank
Group by User
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Pig in Industry
Since Pig is a data flow language, it naturally suits for:
ᗍ Data factory operations
ᗍ Typically data is brought from multiple servers to HDFS
ᗍ Pig is used for cleaning the data and preprocessing it
ᗍ It helps data analysts and researchers for quickly prototyping their theories
ᗍ Since Pig is extensible, it becomes way easier for data analysts to spawn their scripting language programs (like Ruby, Python programs) effectively against large data sets
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Ways to Handle Pig
ᗍ Grunt Mode:
• It’s interactive mode of Pig• Very useful for testing syntax checking and ad-
hoc data exploration
ᗍ Script Mode:
• Runs set of instructions from a file• Similar to a SQL script file
ᗍ Embedded Mode:
• Executes Pig programs from a Java program• Suitable to create Pig Scripts on the fly
Script
Grunt
Embedded
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Modes of Pig
All of the different Pig invocations can run in the following modes:
Local
ᗍ In this mode, entire Pig job runs as a single JVM processᗍ Picks and stores data from local Linux path
Map Reduce
ᗍ In this mode, Pig job runs as a series of map reduce jobsᗍ Input and output paths are assumed as HDFS paths
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Pig Components
Pig Data Flows
Pig Latin is used to express data flows
ExecutionEnvironments
Distributed execution on a Hadoop Cluster
Local execution in a single JVM
1.
2.
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Pig is just a wrapper on top of Map Reduce layer
It parses, optimizes and converts the Pig script to a series of Map Reduce jobs
Pig A series of MapReduce JobsTurns the transformations into…
Pig Programs Execution
Get Started with BIG Data & Hadoop
Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Job Trends – Hadoop
Get Started with BIG Data & Hadoop
Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why SkillSpeed?
Course Curriculum
from Industry Experts
Instructor Led Live Virtual Sessions
Lifetime access to Course
Content via LMS
100% Placement Assistance
24x7 Support
24x7
Get Started with BIG Data & Hadoop
Slide 21© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Course Topics
Module 1
Introduction to Big Data and Hadoop
Module 2
HDFS Internals, Hadoop
Configurations and Data Loading
Module 3
Introduction to Map Reduce
Module 4
Advanced Map Reduce Concepts
Module 5
Introduction to Pig
Module 6
Advanced Pig and Introduction to Hive
Module 7
Advanced Hive Concepts
Module 8
Extending Hive and HBase Introduction
Module 9
Advanced HBase and Oozie Introduction
Module 10
Project Set-up Discussion
Get Started with BIG Data & Hadoop
Slide 22© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Corporate Partners
Get Started with BIG Data & Hadoop
Slide 23© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Lines open 24/7
To know more about the course, Please contact:
IND+91-90660-20904 USA1866-607-6547 (Toll Free)
Or reach us at
Contact Us
Get Started with BIG Data & Hadoop
Slide 24© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Image References
Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots
http://pixshark.com/big-data-comic.htm
http://findicons.com/icon/66444/user_group
http://www.virtualizor.com/tour
https://accounts.it.et.byu.edu/
http://www.clipartsfree.net/tag/server.html
http://www.gopixpic.com/16/time-clock-icon-png-download
http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/
http://www.lincs.fr/research/areas/big-data/
http://www.counsellingpages.co.uk/
http://langfordsconsultancy.com/langfords-training-support-package/
http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html
http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010