4

Click here to load reader

Big Data Hadoop Admin Course Content - s3-ap-southeast · PDF fileTopics covered in the training 1. Module – 1 & Session - 1 a. Understanding Big Data Basics b. Big Data Use Cases

  • Upload
    buiminh

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Big Data Hadoop Admin Course Content - s3-ap-southeast · PDF fileTopics covered in the training 1. Module – 1 & Session - 1 a. Understanding Big Data Basics b. Big Data Use Cases

Course Content

Big Data / Hadoop Admin

[email protected] www.collaberatact.com

Page 2: Big Data Hadoop Admin Course Content - s3-ap-southeast · PDF fileTopics covered in the training 1. Module – 1 & Session - 1 a. Understanding Big Data Basics b. Big Data Use Cases

Topics covered in the training1. Module – 1 & Session - 1

a. Understanding Big Data Basics

b. Big Data Use Cases

c. Introduction to Hadoop

d. Understanding Hadoop Ecosystem

e. Introduction to HDFS

i. Introduction to Namenode

ii. Introduction to Datanode

iii. Introduction to Secondary Namenode

f. Introduction to MapReduce

i. Introduction to JobTracker

ii. Introduction to TaskTracker

g. Summarizing Hadoop Architecture

h. Roles and Responsibilities of a Hadoop Administrator

2. Module – 2 & Session – 2 & 3

a. Linux internals

i. Commands that are required

ii. Linux basics

b. Hadoop Cluster Installation Pre-requisites

i. Pre-requisites of Hadoop Installation

1. Softwares Download

2. Preparing your VM

3. Enabling VM with VMware

4. Understanding mandatory changes in the operating system

c. Installation and Configuration

i. Understanding Hadoop cluster installation modes

ii. Understanding Hadoop version 1 installation and configuration

iii. Passwordless SSH setup

Session - 4

a. Hands-On Practice for creating a Hadoop cluster

i. Helping individually in practicing Hadoop cluster installation

4. Module – 3 & Session - 5

a. Hadoop Cluster Planning

i. Recommended Hadoop cluster configuration

1. Hardware/Software/Network

2. Recommended configuration for Master and Slave Nodes

3. Sample Base configuration

4. Hadoop Different Distributions in the market

b. Hadoop performance tuning

i. Important Hadoop tuning parameters to understand

ii. Hadoop Cluster Benchmarking Jobs – How to run the jobs

Module – 4 & Session – 6 & 7

a. Job Schedulers

i. FIFO Scheduler

ii. Fair Scheduler

b. Backup and Recovery

i. Data backup

ii. Meta-data backup

[email protected] www.collaberatact.com

Page 3: Big Data Hadoop Admin Course Content - s3-ap-southeast · PDF fileTopics covered in the training 1. Module – 1 & Session - 1 a. Understanding Big Data Basics b. Big Data Use Cases

iii. Hadoop Quotas

iv. Safemode

v. Hadoop Ports

c. DistCP

d. Security

i. How to secure your cluster using Kerberos

e. Upgrades

i. Upgrading Hadoop cluster from Hadoop 1 to Hadoop 2

6. Module – 5 & Session – 8

a. Hadoop 2.0 new features

b. YARN

i. Understanding Resource Manager

ii. Understanding Application Master

iii. Understanding Node Manager

iv. Understanding Hadoop 2 Job Execution Framework

c. Hadoop 2 Multi-node cluster creation

i. Pre-requisites of Hadoop Installation

ii. Softwares Download

iii. Preparing your VM

iv. Enabling VM with VMware

v. Understanding mandatory changes in the operating system

vi. Installation and Configuration

vii. Understanding Hadoop version 2 installation and configuration

viii. Passwordless SSH setup

7. Session - 9

a. Practice Hadoop 2 multi-node Cluster Creation

i. Helping individuals in practicing Hadoop 2 cluster installation

b. Sample Yarn Job execution

Module – 6 & Session – 10 & 11

a. Understanding Issues of Hadoop 1

b. Understanding improvements in Hadoop 2

c. Namenode Federation

i. Enable segregation of HDFS using multiple namenodes

d. Namenode – High Availability

i. Achieving Namenode High-Availability using Quorum Journal Manager

ii. Achieving Namenode High-Availability using Network File System

Session - 12

a. Implementation of NN High Availability

i. Helping individuals achieving Namenode High Availability

10. Module – 7 & Session – 13, 14

a. Hadoop Ecosystem Introduction

i. Understanding the integration of Hadoop ecosystem

b. Touchbase with Hive

i. What is Hive

ii. Architecture of Hive

iii. Understanding Hive metastore concepts

[email protected] www.collaberatact.com

Page 4: Big Data Hadoop Admin Course Content - s3-ap-southeast · PDF fileTopics covered in the training 1. Module – 1 & Session - 1 a. Understanding Big Data Basics b. Big Data Use Cases

c. HBase

i. Understading HBase Basics

ii. Understanding HBase storage Model

iii. Understanding HBase Architecture

iv. Cluster Installation and Configuration

d. Pig

i. What is Pig?

ii. How Pig integrates with Hadoop cluster?

iii. Demo of Pig Jobs using MapReduce

e. Sqoop

I. What is Sqoop?

ii. How to import and export the data from Sqoop to RDBMS?

iii. Example of Sqoop jobs using MySQL

f. Flume

i. What is Flume?

ii. Sample Flume jobs

11. Module – 8 & Session - 15

a. Understanding the internals of Cloudera Manager

b. Understanding the automation of Hadoop installation using Cloudera Manager

c. Understanding Cloudera Hadoop Distribution and Cloudera Manager

d. Understanding the underlying directory structure of Cloudera Hadoop

e. Cloudera Hadoop Cluster Installation – CDH

[email protected] www.collaberatact.com