Transcript
Page 1: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

red red red red red red red red red red red red red red red red red red red red

CYS14011 - Rithu P Ravi

CYS14012 - Saumya K

— red 1/1

Page 2: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Why and What HADOOP?...

Apache Hadoop is an open-source software framework

A tool to process big data

Rithu P Ravi,SaumyaK — HADOOP 2/30

Page 3: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Outline

1 Big Data

2 Hadoop...

3 HDFS

4 Map Reduce

Rithu P Ravi,SaumyaK — HADOOP 3/30

Page 4: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Big Data

Data beyond storage and processing power

3 ‘V’s

Volume

Velocity

Variety

Rithu P Ravi,SaumyaK — HADOOP 4/30

Page 5: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Big Data

Exponential growth of data

Challenges to Google, Yahoo, Microsoft, Amazon

Need to go through TBs and PBs of data ?

Existing tools became inadequate to process such largedata sets.

Rithu P Ravi,SaumyaK — HADOOP 5/30

Page 6: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Big ElephantNumerous small chicken..?

Rithu P Ravi,SaumyaK — HADOOP 6/30

Page 7: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

How to handle such BIG ?

Issues

How to handle a system up and downs ?

How to combine the data from all the systems ?

Rithu P Ravi,SaumyaK — HADOOP 7/30

Page 8: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Problem1 : System’s Ups Downs

Commodity hardware for data storage and analysis

Chances of failure are very high

Replication of data across some machines

GFS (Google File System)

GFS

Divides data into chunks and stores in the file System

Can store data in ranges of PBs also

Rithu P Ravi,SaumyaK — HADOOP 8/30

Page 9: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Problem 2 : How to combine the data ?

Analyze data across different machines .

Merge-, Data has to travel across network.

Doing this is notoriously challenging

Again GoogleMap—Reduce

Rithu P Ravi,SaumyaK — HADOOP 9/30

Page 10: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Map Reduce

Provides a programming model

Abstracts disk reads and writes

Converts to (keys,values) pair

Two Phases

MapReduce

Rithu P Ravi,SaumyaK — HADOOP 10/30

Page 11: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Outline

1 Big Data

2 Hadoop...

3 HDFS

4 Map Reduce

Rithu P Ravi,SaumyaK — HADOOP 11/30

Page 12: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

HADOOP

A reliable shared storage system

Analysis system

Rithu P Ravi,SaumyaK — HADOOP 12/30

Page 13: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

History

Google was the first to launch GFS and MapReduce

Published a paper – 2004

A brand new technology

Was well proven in Google by 2004 itself

Rithu P Ravi,SaumyaK — HADOOP 13/30

Page 14: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

History

Doug Cutting

Open source version of MapReduce system called Hadoop

Yahoo and others rallied around to support this effort.

Now Hadoop is core part in : Facebook, Yahoo, LinkedIn,Twitter

Rithu P Ravi,SaumyaK — HADOOP 14/30

Page 15: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Core Concepts

HDFS

Map Reduce

Rithu P Ravi,SaumyaK — HADOOP 15/30

Page 16: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Outline

1 Big Data

2 Hadoop...

3 HDFS

4 Map Reduce

Rithu P Ravi,SaumyaK — HADOOP 16/30

Page 17: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

HDFS...Hadoop Distributed File System

Streaming very large files on commodity cluster

1 Very Large Files : MBs to PBs2 Streaming

Write once read many approachNo modifiationTime to read the whole data is more important

3 Commodity Cluster

No High end ServersYes, high chance of failure (But HDFS is tolerantenough)Replication is done

Rithu P Ravi,SaumyaK — HADOOP 17/30

Page 18: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

HDFSHadoop Distributed File System...

Services

Masters

Name Node

Secondary Name Node

Job Tracker

Slaves

Data Node

Task Tracker

Rithu P Ravi,SaumyaK — HADOOP 18/30

Page 19: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

HDFSHadoop Distributed File System...

Name Node

Master Node

Maintains Name System

Meta Data

Secondary Name Node

Periodically updating fsimage file

Data Node

Slaves

Actual Storage

Rithu P Ravi,SaumyaK — HADOOP 19/30

Page 20: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

HDFS Architecture

Rithu P Ravi,SaumyaK — HADOOP 20/30

Page 21: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Outline

1 Big Data

2 Hadoop...

3 HDFS

4 Map Reduce

Rithu P Ravi,SaumyaK — HADOOP 21/30

Page 22: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Map Reduce

Large scale data processing in parallel.

It provides

Automatic parallelization and distributionFault-tolerance

Two Phases in Map Reduce

MapReduce

Rithu P Ravi,SaumyaK — HADOOP 22/30

Page 23: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Map Reduce

Job Tracker

Master

Manages the jobes in the cluster

Task Tracker

Slaves

Responsible for Map Reduce

Rithu P Ravi,SaumyaK — HADOOP 23/30

Page 24: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Map Reduce

Rithu P Ravi,SaumyaK — HADOOP 24/30

Page 25: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Map Reduce

Map Phase

map(inKey,invalue)-list(outKey, intermediateValue)

Processes input key/value pair

Produces set of intermediate pairs

Reduce Phase

reduce(outKey,list(intermediateValue))- list(outValue)

Combines all intermediate values for a particular key

Produces a set of merged output values (usually just one)

Rithu P Ravi,SaumyaK — HADOOP 25/30

Page 26: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Map Reduce

Rithu P Ravi,SaumyaK — HADOOP 26/30

Page 27: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Map Reduce

Rithu P Ravi,SaumyaK — HADOOP 27/30

Page 28: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Map Reduce

Rithu P Ravi,SaumyaK — HADOOP 28/30

Page 29: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

ReferencesIf you want to improve this style

Hadoop Tutorial-Durga Softhttps://www.youtube.com/watch?v=DLutRT6K2rM/

Hadoop Official Sitehttp://hadoop.apache.org/index.html/

Processing Big Data using Hadoop FrameworkPrashant D. Londhe, Satish S. Kumbhar, Ramakant S.Sul, Amit J. Khadse

Rithu P Ravi,SaumyaK — HADOOP 29/30

Page 30: red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

Big Data Hadoop... HDFS Map Reduce

Happy Hadooping.... :)

Rithu P Ravi,SaumyaK — HADOOP 30/30


Recommended