14
HADOOP ADMIN ONLINE TRAINING Glory IT Technologies

Hadoop admin online training

Embed Size (px)

Citation preview

Page 1: Hadoop admin online training

HADOOP ADMIN ONLINE TRAINING

Glory IT Technologies

Page 2: Hadoop admin online training

Prerequisites

Knowledge of Hadoop and Distributed Computing.

Page 3: Hadoop admin online training

Module 1 – Introduction to Hadoop

The amount of data processing in today’s life What Hadoop is why it is important? Hadoop comparison with traditional systems Hadoop history Hadoop main components and architecture

Page 4: Hadoop admin online training

Module 2 – Hadoop Distributed File System (HDFS)

HDFS overview and design HDFS architecture HDFS file storage Component failures and recoveries Block placement Balancing the Hadoop cluster

Page 5: Hadoop admin online training

Module 3 – Planning your Hadoop cluster

Planning a Hadoop cluster and its capacity Hadoop software and hardware configuration HDFS Block replication and rack awareness Network topology for Hadoop cluster

Page 6: Hadoop admin online training

Module 4 – Hadoop Deployment

Different Hadoop deployment types Hadoop distribution options Hadoop competitors Hadoop installation procedure Distributed cluster architecture

Page 7: Hadoop admin online training

Module 5 – Working with HDFS

Ways of accessing data in HDFS Common HDFS operations and commands Different HDFS commands Internals of a file read in HDFS Data copying with ‘distcp’

Page 8: Hadoop admin online training

Module 6 -Mapreduce Abstraction

What MapReduce is and why it is popular The Big Picture of the MapReduce MapReduce process and terminology MapReduce components failures and

recoveries Working with MapReduce

Page 9: Hadoop admin online training

Module 7 – Hadoop Cluster Configuration

Hadoop configuration overview and important configuration file

Configuration parameters and values HDFS parameters MapReduce parameters Hadoop environment setup ‘Include’ and ‘Exclude’ configuration files

Page 10: Hadoop admin online training

Module 8 – Hadoop Administration and Maintenance

Namenode/Data node directory structures and files

File system image and Edit log The Checkpoint Procedure Namenode failure and recovery procedure Safe Mode Metadata and Data backup Potential problems and solutions / what to look

for Adding and removing nodes

Page 11: Hadoop admin online training

Module 9 – Hadoop Monitoring and Troubleshooting

Best practices of monitoring a Hadoop cluster Using logs and stack traces for monitoring and

troubleshooting Using open-source tools to monitor Hadoop

cluster

Page 12: Hadoop admin online training

Module 10 – Job Scheduling

How to schedule Hadoop Jobs on the same cluster

Default Hadoop FIFO Schedule Fair Scheduler and its configuration

Page 13: Hadoop admin online training

Module 11 – Hadoop Multi Node Cluster Setup and Running Map Reduce Jobs on Amazon Ec2

Hadoop Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup

Running Map Reduce Jobs on Cluster

Page 14: Hadoop admin online training

THANK YOU