of 30 /30
red red red red red red red red red red red red red red red red red red red red CYS14011 - Rithu P Ravi CYS14012 - Saumya K

red red red red red red red red red red red red red red ...docshare01.docshare.tips/files/26111/261116234.pdf · red red red red red red red red red red red red red red red red red

  • Author
    others

  • View
    14

  • Download
    0

Embed Size (px)

Text of red red red red red red red red red red red red red red...

  • red red red red red red red red red red red red red red red red red red red red

    CYS14011 - Rithu P Ravi

    CYS14012 - Saumya K

    — red 1/1

  • Big Data Hadoop... HDFS Map Reduce

    Why and What HADOOP?...

    Apache Hadoop is an open-source software framework

    A tool to process big data

    Rithu P Ravi,SaumyaK — HADOOP 2/30

  • Big Data Hadoop... HDFS Map Reduce

    Outline

    1 Big Data

    2 Hadoop...

    3 HDFS

    4 Map Reduce

    Rithu P Ravi,SaumyaK — HADOOP 3/30

  • Big Data Hadoop... HDFS Map Reduce

    Big Data

    Data beyond storage and processing power

    3 ‘V’s

    Volume

    Velocity

    Variety

    Rithu P Ravi,SaumyaK — HADOOP 4/30

  • Big Data Hadoop... HDFS Map Reduce

    Big Data

    Exponential growth of data

    Challenges to Google, Yahoo, Microsoft, Amazon

    Need to go through TBs and PBs of data ?

    Existing tools became inadequate to process such largedata sets.

    Rithu P Ravi,SaumyaK — HADOOP 5/30

  • Big Data Hadoop... HDFS Map Reduce

    Big ElephantNumerous small chicken..?

    Rithu P Ravi,SaumyaK — HADOOP 6/30

  • Big Data Hadoop... HDFS Map Reduce

    How to handle such BIG ?

    Issues

    How to handle a system up and downs ?

    How to combine the data from all the systems ?

    Rithu P Ravi,SaumyaK — HADOOP 7/30

  • Big Data Hadoop... HDFS Map Reduce

    Problem1 : System’s Ups Downs

    Commodity hardware for data storage and analysis

    Chances of failure are very high

    Replication of data across some machines

    GFS (Google File System)

    GFS

    Divides data into chunks and stores in the file System

    Can store data in ranges of PBs also

    Rithu P Ravi,SaumyaK — HADOOP 8/30

  • Big Data Hadoop... HDFS Map Reduce

    Problem 2 : How to combine the data ?

    Analyze data across different machines .

    Merge-, Data has to travel across network.

    Doing this is notoriously challenging

    Again GoogleMap—Reduce

    Rithu P Ravi,SaumyaK — HADOOP 9/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Provides a programming model

    Abstracts disk reads and writes

    Converts to (keys,values) pair

    Two Phases

    MapReduce

    Rithu P Ravi,SaumyaK — HADOOP 10/30

  • Big Data Hadoop... HDFS Map Reduce

    Outline

    1 Big Data

    2 Hadoop...

    3 HDFS

    4 Map Reduce

    Rithu P Ravi,SaumyaK — HADOOP 11/30

  • Big Data Hadoop... HDFS Map Reduce

    HADOOP

    A reliable shared storage system

    Analysis system

    Rithu P Ravi,SaumyaK — HADOOP 12/30

  • Big Data Hadoop... HDFS Map Reduce

    History

    Google was the first to launch GFS and MapReduce

    Published a paper – 2004

    A brand new technology

    Was well proven in Google by 2004 itself

    Rithu P Ravi,SaumyaK — HADOOP 13/30

  • Big Data Hadoop... HDFS Map Reduce

    History

    Doug Cutting

    Open source version of MapReduce system called Hadoop

    Yahoo and others rallied around to support this effort.

    Now Hadoop is core part in : Facebook, Yahoo, LinkedIn,Twitter

    Rithu P Ravi,SaumyaK — HADOOP 14/30

  • Big Data Hadoop... HDFS Map Reduce

    Core Concepts

    HDFS

    Map Reduce

    Rithu P Ravi,SaumyaK — HADOOP 15/30

  • Big Data Hadoop... HDFS Map Reduce

    Outline

    1 Big Data

    2 Hadoop...

    3 HDFS

    4 Map Reduce

    Rithu P Ravi,SaumyaK — HADOOP 16/30

  • Big Data Hadoop... HDFS Map Reduce

    HDFS...Hadoop Distributed File System

    Streaming very large files on commodity cluster

    1 Very Large Files : MBs to PBs2 Streaming

    Write once read many approachNo modifiationTime to read the whole data is more important

    3 Commodity Cluster

    No High end ServersYes, high chance of failure (But HDFS is tolerantenough)Replication is done

    Rithu P Ravi,SaumyaK — HADOOP 17/30

  • Big Data Hadoop... HDFS Map Reduce

    HDFSHadoop Distributed File System...

    Services

    Masters

    Name Node

    Secondary Name Node

    Job Tracker

    Slaves

    Data Node

    Task Tracker

    Rithu P Ravi,SaumyaK — HADOOP 18/30

  • Big Data Hadoop... HDFS Map Reduce

    HDFSHadoop Distributed File System...

    Name Node

    Master Node

    Maintains Name System

    Meta Data

    Secondary Name Node

    Periodically updating fsimage file

    Data Node

    Slaves

    Actual Storage

    Rithu P Ravi,SaumyaK — HADOOP 19/30

  • Big Data Hadoop... HDFS Map Reduce

    HDFS Architecture

    Rithu P Ravi,SaumyaK — HADOOP 20/30

  • Big Data Hadoop... HDFS Map Reduce

    Outline

    1 Big Data

    2 Hadoop...

    3 HDFS

    4 Map Reduce

    Rithu P Ravi,SaumyaK — HADOOP 21/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Large scale data processing in parallel.

    It provides

    Automatic parallelization and distributionFault-tolerance

    Two Phases in Map Reduce

    MapReduce

    Rithu P Ravi,SaumyaK — HADOOP 22/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Job Tracker

    Master

    Manages the jobes in the cluster

    Task Tracker

    Slaves

    Responsible for Map Reduce

    Rithu P Ravi,SaumyaK — HADOOP 23/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Rithu P Ravi,SaumyaK — HADOOP 24/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Map Phase

    map(inKey,invalue)-list(outKey, intermediateValue)

    Processes input key/value pair

    Produces set of intermediate pairs

    Reduce Phase

    reduce(outKey,list(intermediateValue))- list(outValue)

    Combines all intermediate values for a particular key

    Produces a set of merged output values (usually just one)

    Rithu P Ravi,SaumyaK — HADOOP 25/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Rithu P Ravi,SaumyaK — HADOOP 26/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Rithu P Ravi,SaumyaK — HADOOP 27/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Rithu P Ravi,SaumyaK — HADOOP 28/30

  • Big Data Hadoop... HDFS Map Reduce

    ReferencesIf you want to improve this style

    Hadoop Tutorial-Durga Softhttps://www.youtube.com/watch?v=DLutRT6K2rM/

    Hadoop Official Sitehttp://hadoop.apache.org/index.html/

    Processing Big Data using Hadoop FrameworkPrashant D. Londhe, Satish S. Kumbhar, Ramakant S.Sul, Amit J. Khadse

    Rithu P Ravi,SaumyaK — HADOOP 29/30

  • Big Data Hadoop... HDFS Map Reduce

    Happy Hadooping.... :)

    Rithu P Ravi,SaumyaK — HADOOP 30/30

    Big DataHadoop...HDFSMap Reduce