View
1.504
Download
3
Category
Preview:
DESCRIPTION
This is a straight-forward tutorial for those who are goring to use HDFS in an academic environment on their notebooks or PCs.
Citation preview
کارگاه پردازش داده توزیع شده
پردیس- شهیدبهشتی
دانشکده علوم و مهندسی کامپیوتر
پایگاه داده توزیع شدهدرس:
دکتر هادی طباطباییاستاد:
ابوالفضل صدیقی ارائه: ۱۳۹۳آذر
2
Apache Hadoop 2.x Cluster Installation
Amir Sedighi@amirsedighi
http://hexican.com
Dec 2014
3
References
● http://hadoop.apache.org/docs/r2.2.0/
● http://www.vasanthivuppuluri.com/hadoop/installing-hadoop-2-5-1-on-64-bit-ubuntu-14-01/
● https://sites.google.com/site/hadoopandhive/home
4
Topics
● Assumptions
● First Node
– Installing Java
– Downloading and Extracting Hadoop
– Hadoop and Java Env Variables
– Disabling IP6
– Configuring Hadoop
● Cloning
● HDFS– Starting HDFS
● HDFS Health● FS Commands● Reclaiming Space● Reducing Replication Factor
5
Assumptions
● You already know about Linux.
– http://www.slideshare.net/AmirSedighi/distrinuted-data-processing-workshop-sbu
6
Installing Java
● $ sudo apt-get install default-jdk
7
Downloading and Extracting
● http://hadoop.apache.org/releases.html
● $ tar -zxvf hadoop-2.2.0.tar.gz
8
Hadoop and Java Env Variables
● Append the following definitions to /etc/profile or ~/.bashrc
export HADOOP_PREFIX="/home/amir/hadoop-2.2.0"
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export JAVA_HOME=/usr/java/jdk1.7.0_55
export PATH=$PATH:$JAVA_HOME/bin:/home/amir/hadoop-2.2.0/bin:/home/amir/hadoop-2.2.0/sbin
9
Disabling IP6
● $ sudo nano /etc/sysctl.conf
# Disable IPv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
10
Hadoop Configuration
● You would need to create or modify the following files inside hadoop/etc/hadoop:
– slaves
– core-site.xml
– yarn-site.xml
– hdfs-site.xml
– hadoop-env.sh
11
slaves
● List all DataNodes in slaves file.
slave1
slave2
slave3
12
slaves
Create slaves in hadoop/etc/hadoop folder:
u01
u02
u03
u04
u05
u06
...
13
etc/hosts and hadoop/etc/hadoop/slaves
14
core-site.xml
● Edit core-site.xml and apply the following:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://u01/</value>
<description>NameNode URI</description>
</property>
</configuration>
15
core-site.xml
16
yarn-site.xml<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>u01</value>
<description>The hostname of the RM.</description>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
17
yarn-site.xml
18
hdfs-site.xml
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/amir/hadoop-2.2.0/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/amir/hadoop-2.2.0/hdfs/namenode</value>
</property>
</configuration>
19
hdfs-site.xml
20
hadoop-env.sh
● Add the following:
– export JAVA_HOME=/usr/java/jdk1.7.0_55
21
Reboot
● $ sudo reboot
22
Cloning
● Extend the cluster by cloning.
– NOTE: Find the instruction here:● http://www.slideshare.net/AmirSedighi/distrinuted-data-
processing-workshop-sbu
23
HDFS
● The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
● It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant.
● HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware.
● HDFS provides high throughput access to application data and is suitable for applications that have large data sets.
● HDFS relaxes a few POSIX requirements to enable streaming access to file system data.
● HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is part of the Apache Hadoop Core project.
24
HDFS Architecture
25
DataNodes
26
start-dfs.sh
27
HDFS Health
● $ jps
– NameNode
– DataNode
● Check log files● Web UI
– http://u01:50070
28
HDFS Health
29
30
HDFS Health, Live Nodes
31
Hadoop FS Commands
● cat
● chmod
● chown
● copyFromLocal
● copyToLocal
● cp
● du
● expunge
● get
● ls
● mkdir
● put
● rm
● tail
32
HDFS Commands
33
Space Reclamation
● Delete Files
– $ hadoop fs -rm /filename
– $ hadoop fs -expunge
● Decrease Replication Factor
34
How to change replication factor of existing files in HDFS
● To set replication of an individual file to 4:
– hadoop dfs -setrep -w 4 /path/to/file
● You can also do this recursively. To change replication of entire HDFS to 1:
– hadoop dfs -setrep -R -w 1 /
35
Questions?
Recommended