Preview:
Citation preview
- 1. Hadoop :
- 2. Outline Hadoop HBase Hadoop HBase 2
- 3. Hadoop HBase Hadoop HBase 3
- 4. (1/5) HadoopGNU/LinuxWin32 GNU/Linux HadoopJavassh
HadoopJavaJava(JRE) Java() 4
- 5. ubuntu 14.04 OpenJDKJava java -version ~# java -version java
version "1.7.0_45" OpenJDK Runtime Environment (IcedTea6 1.7.5)
(rhel-1.16.b17.el5-i386) OpenJDK Client VM (build 24.0-b16, mixed
mode) OpenJDK yum : ( sudo apt-get install ) ~# yum -y install
java-1.7.0_45-openjdk ~#sudo apt-get install java-1.7.0_45-openjdk
(2/5) 5
- 6. HadoopOpenJDKOpenJDK Java Oracle (Sun) Java JDKOracle (Sun)
Java JDK Oracle(http://www.oracle.com) Hadoop 2.7.0 Java
jdk-1.6.0-openjdk (3/5) 6
- 7. jdk-8u45-linux-i586.bin/usr ~# chmod +x
jdk-8u45-linux-i586.bin ~# ./jdk-8u45-linux-i586.bin /usr(
)jdk1.7.0_45 alternativesOracle (Sun) Java JDKOpenJDK ~#
alternatives --install /usr/bin/java java /usr/jdk1.7.0_45/bin/java
20000 or 16888 ~# alternatives --install /usr/bin/javac javac
/usr/jdk1.7.0_45/bin/javac 20000 or 16888 (4/5) 7
- 8. Java ~# java version java version "1.7.0_45" Java(TM) SE
Runtime Environment (build 1.7.0_45-b06) Java HotSpot(TM) Client VM
(build 20.0-b11, mixed mode, sharing) ~#javac -version Javac
1.7.0_45 sshrsync ~#sudo apt-get install openssh rsync or ~# yum -y
install openssh rsync ~# /etc/init.d/sshd restart Hadoop root (5/5)
8
- 9. Hadoop HBase Hadoop HBase 9
- 10. Hadoop Hadoop Local (Standalone) Mode Pseudo-Distributed
Mode Fully-Distributed Mode 10
- 11. Local (Standalone) Mode(1/7) HadoopApache Hadoop
(http://hadoop.apache.org/) HadoopHadoop 2.6.0Hadoop 2.7.0 Hadoop
2.6.0wget hadoop-2.6.0.tar.gz or rpm ~# wget
http://apache.cs.pu.edu.tw//hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
~# tar zxvf hadoop-2.6.0.tar.gz 11
- 12. Local (Standalone) Mode(2/7) hadoop-0.20.2/opt hadoop ~# mv
hadoop-0.20.2 /opt/hadoop JavaHadoophadoop viconf/hadoop-env.sh ~#
cd /opt/hadoop/ /hadoop# vi conf/hadoop-env.sh 12
- 13. Local (Standalone) Mode(3/7) hadoop-env.shJAVA_HOME(export
JAVA_HOME=/usr/jdk1.7.0_45) IPv6 IPv6hadoop-env.shexport
HADOOP_OPTS=- Djava.net.preferIPv4Stack=trueIPv4 # Command specific
options appended to HADOOP_OPTS when specified ... ... ... export
HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_JOBTRACKER_OPTS" export JAVA_HOME=/usr/jdk1.7.0_45
JAVA_HOME export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true IPv4
13
- 14. Local (Standalone) Mode(4/7) Hadoop Local (Standalone) Mode
Hadoop conf/hadoop-env.shJAVA_HOME /hadoop# bin/hadoop Usage:
hadoop [--config confdir] COMMAND where COMMAND is one of: namenode
-format format the DFS filesystem ... ... ... or CLASSNAME run the
class named CLASSNAME Most commands print help when invoked w/o
parameters. 14
- 15. Local (Standalone) Mode(5/7) Hadoop
hadoop-2.6.0-examples.jargrep input conf/xmlinput /hadoop# mkdir
input /hadoop# cp conf/*.xml input 15
- 16. Local (Standalone) Mode(6/7) hadoop-2.6.0-examples.jar
grepconfig /hadoop# bin/hadoop jar hadoop-2.6.0-examples.jar grep
input output 'config[a-z.]+' /hadoop# cat output/* 13 configuration
4 configuration.xsl 1 configure 16
- 17. Local (Standalone) Mode(7/7) hadoop-2.6.0-examples.jar grep
output output /hadoop# rm -rf output 17
- 18. Pseudo-Distributed Mode(1/9) Pseudo-Distributed ModeLocal
(Standalone) Mode confcore- site.xmlhdfs-site.xmlmapred-site.xml
core-site.xml /hadoop# vi conf/core-site.xml 18
- 19. Pseudo-Distributed Mode(2/9) core-site.xml
fs.default.namehdfs://localhost:9000 19
- 20. Pseudo-Distributed Mode(3/9) hdfs-site.xml /hadoop# vi
conf/hdfs-site.xml hdfs-site.xml dfs.replication1 20
- 21. Pseudo-Distributed Mode(4/9) mapred-site.xml /hadoop# vi
conf/mapred-site.xml mapred-site.xml
mapred.job.trackerlocalhost:9001 21
- 22. Pseudo-Distributed Mode(5/9) SSH Hadoopssh ssh ( yesEnter )
~# ssh localhost The authenticity of host 'localhost (127.0.0.1)'
can't be established. RSA key fingerprint is Are you sure you want
to continue connecting (yes/no)? yes yes Warning: Permanently added
'localhost' (RSA) to the list of known hosts. root@localhost's
password: 22
- 23. Pseudo-Distributed Mode(6/9) Ctrl + C ~# ssh-keygen -t rsa
-f ~/.ssh/id_rsa -P "" ~# cp ~/.ssh/id_rsa.pub
~/.ssh/authorized_keys exit ~# ssh localhost Last login: Mon May 16
10:04:39 2011 from localhost ~# exit 23
- 24. Pseudo-Distributed Mode(7/9) HadoopHadoop bin/hadoop
namenode -format HDFS /hadoop# bin/hadoop namenode -format 11/05/16
10:20:27 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode ... ... ... 11/05/16 10:20:28 INFO
namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
************************************************************/
24
- 25. Pseudo-Distributed Mode(8/9) bin/start-all.sh
JobtrackerTasktracker /hadoop# bin/start-all.sh starting namenode,
logging to /opt/hadoop/bin/../logs/hadoop-root-namenode-Host01.out
localhost: starting datanode, logging to
/opt/hadoop/bin/../logs/hadoop-root-datanode- Host01.out localhost:
starting secondarynamenode, logging to
/opt/hadoop/bin/../logs/hadoop-root- secondarynamenode-Host01.out
starting jobtracker, logging to
/opt/hadoop/bin/../logs/hadoop-root-jobtracker-Host01.out
localhost: starting tasktracker, logging to
/opt/hadoop/bin/../logs/hadoop-root-tasktracker- Host01.out 25
- 26. Pseudo-Distributed Mode(9/9) hadoop-2.6.0-examples.jar grep
bin/hadoop fs -putconf HDFSinputhadoop-2.6.0- examples.jar grep
/hadoop# bin/hadoop fs -put conf input /hadoop# bin/hadoop jar
hadoop-2.6.0-examples.jar grep input output 'config[a-z.]+' 26
- 27. Fully-Distributed Mode(1/14) Hadoop MasterSlaveJavassh IP
Host01 Namenode + Jobtracker 192.168.1.1 Host02 Datanode +
Tasktracker 192.168.1.2 27
- 28. Fully-Distributed Mode(2/14) stop-all.shHadoop Hadoop.ssh
/hadoop# /opt/hadoop/bin/stop-all.sh ~# rm -rf /opt/hadoop ~# rm
-rf ~/.ssh ~# rm -rf /tmp/* HadoopHost01 Host01Hadoop 2.6.0
Hadoop/opt/hadoop ~# wget
http://apache.cs.pu.edu.tw//hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
~# tar zxvf hadoop-2.6.0.tar.gz ~# mv hadoop-2.6.0 /opt/Hadoop or
/usr ~# mv hadoop-2.6.0 /usr/Hadoop 28
- 29. Fully-Distributed Mode(3/14) bin/hadoop-env.shHadoop
/opt/hadoopvibin/hadoop-env.sh ~# cd /opt/hadoop/ /hadoop# vi
conf/hadoop-env.sh hadoop-env.shJAVA_HOME # Command specific
options appended to HADOOP_OPTS when specified export
HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_JOBTRAC KER_OPTS" export JAVA_HOME=/usr/jdk1.7.0_45
JAVA_HOME 29
- 30. Fully-Distributed Mode(4/14) conf/core-site.xmlvi /hadoop#
vi conf/core-site.xml conf/core-site.xml 30
- 31. Fully-Distributed Mode(5/14)
fs.default.namehdfs://Host01:9000hadoop.tmp.dir/var/hadoop/hadoop-${user.name}
31
- 32. Fully-Distributed Mode(6/14) conf/hdfs-site.xmlviconf/hdfs-
site.xml /hadoop# vi conf/hdfs-site.xml conf/hdfs-site.xml
dfs.replication2 32
- 33. Fully-Distributed Mode(7/14) conf/mapred-site.xmlvi
conf/mapred-site.xml /hadoop# vi conf/mapred-site.xml
conf/mapred-site.xml mapred.job.trackerHost01:9001 33
- 34. Fully-Distributed Mode(8/14) conf/mastersviconf/masters
/hadoop# vi conf/mapred-site.xml conf/slavesviconf/slaves
conf/slaveslocalhostHost02 /hadoop# vi conf/mapred-site.xml 34
- 35. Fully-Distributed Mode(9/14) scp ~# ssh-keygen -t rsa -f
~/.ssh/id_rsa -P "" ~# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
~# scp -r ~/.ssh Host02:~/ ~# ssh Host02 Host01Host02 ~# ssh Host01
Host02Host01 ~# exit Host01 ~# exit Host02 (Host01) 35
- 36. Fully-Distributed Mode(10/14) Hadoop HadoopSlave (NFS)
Host01HadoopHost02 ~# scp -r /opt/hadoop Host02:/opt/ HadoopHDFS
/hadoop# bin/hadoop namenode -format 36
- 37. Fully-Distributed Mode(11/14) 11/05/16 21:52:13 INFO
namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode STARTUP_MSG: host = Host01/127.0.0.1
STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20
-r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
11/05/16 21:52:13 INFO namenode.FSNamesystem:
fsOwner=root,root,bin,daemon,sys,adm,disk,wheel ... ... 11/05/16
21:52:13 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Host01/127.0.0.1
************************************************************/
37
- 38. Fully-Distributed Mode(12/14) Hadoop /hadoop#
bin/start-all.sh starting namenode, logging to
/opt/hadoop/bin/../logs/hadoop-root-namenode-Host01.out Host02:
starting datanode, logging to
/opt/hadoop/bin/../logs/hadoop-root-datanode-Host02.out starting
jobtracker, logging to
/opt/hadoop/bin/../logs/hadoop-root-jobtracker-Host01.out Host02:
starting tasktracker, logging to
/opt/hadoop/bin/../logs/hadoop-root-tasktracker- Host02.out 38
- 39. Fully-Distributed Mode(13/14) Fully-Distributed Mode
bin/hadoop dfsadmin -reportHDFS HDFS /hadoop# bin/hadoop dfsadmin
-report Configured Capacity: 9231007744 (8.6 GB) ... ... Blocks
with corrupt replicas: 0 Missing blocks: 0
------------------------------------------------- Datanodes
available: 1 (1 total, 0 dead) ... ... DFS Remaining%: 41.88% Last
contact: Mon May 16 22:15:03 CST 2011 39
- 40. Fully-Distributed Mode(14/14) hadoop-0.20.2-examples.jar
grep HDFSinputHadoop conf/hadoop-0.20.2- examples.jar grep /hadoop#
bin/hadoop fs -mkdir input /hadoop# bin/hadoop fs -put conf/*
input/ /hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep
input output 'config[a-z.]+' /hadoop# bin/hadoop fs -cat
output/part-00000 19 configuration 6 configuration.xsl 1 configure
40
- 41. Hadoop HBase Hadoop HBase 41
- 42. HBase(1/9) HBase Hadoop HBase2.60ZooKeeper NTP 42
- 43. HBase(2/9) HBase HBase(http://hbase.apache.org/)HBase
hbase-2.x0.2.tar.gz /opt/hbaseHBase ~# wget
http://apache.cs.pu.edu.tw//hbase/hbase-0.90.2/hbase-0.90.2.tar.gz/hadoop~#
tar zxvf hbase-2.x0.2.tar.gz ~# mv hbase-2.x0.2 /opt/hbase ~# cd
/opt/hbase/ viconf/hbase-env.sh /hbase# vi conf/hbase-env.sh
43
- 44. HBase(3/9) conf/hbase-env.sh export
JAVA_HOME=/usr/jdk1.6.0_25/ export HBASE_MANAGES_ZK=true export
HBASE_LOG_DIR=/tmp/hadoop/hbase-logs export
HBASE_PID_DIR=/tmp/hadoop/hbase-pids conf/hbase-site.xmlHBase
/hbase# vi conf/hbase-site.xml 44
- 45. HBase(4/9) conf/hbase-site.xml
hbase.rootdirhdfs://Host01:9000/hbasehbase.cluster.distributedtruehbase.zookeeper.property.clientPort<
/name>
2222hbase.zookeeper.quorumHost01,Host02hbase.zookeeper.property.dataDirame>
/tmp/hadoop/hbase-data 45
- 46. HBase(5/9)
hbase.tmp.dir/var/hadoop/hbase-${user.name}lue> hbase.master
Host01:60000 46
- 47. HBase(6/9) viconf/regionservers /hbase# vi
conf/regionservers Slaveslaves Slave Host02 HadoopHBaseconf/
/hbase# cp /opt/hadoop/conf/core-site.xml conf/ /hbase# cp
/opt/hadoop/conf/mapred-site.xml conf/ /hbase# cp
/opt/hadoop/conf/hdfs-site.xml conf/ 47
- 48. HBase(7/9) hbaselib/hadoop-core-0.20-append-
r1056497.jarhadoophadoop- 0.20.2-core.jarhbaselib /hbase# rm
lib/hadoop-core-0.20-append-r1056497.jar /hbase# cp
/opt/hadoop/hadoop-0.20.2-core.jar ./lib/ hbaseSlave /hbase# scp -r
/opt/hbase Host02:/opt/hbase 48
- 49. HBase(8/9) HBase /hbase# bin/start-hbase.sh Host02:
starting zookeeper, logging to
/tmp/hadoop/hbase-logs/hbase-root-zookeeper-Host02.out Host01:
starting zookeeper, logging to
/tmp/hadoop/hbase-logs/hbase-root-zookeeper-Host01.out starting
master, logging to
/tmp/hadoop/hbase-logs/hbase-root-master-Host01.out Host02:
starting regionserver, logging to
/tmp/hadoop/hbase-logs/hbase-root-regionserver-Host02.out 49
- 50. HBase(9/9) hbase shellHBaselist HBsae /hbase# bin/hbase
shell HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell Version 0.90.2, r1085860, Sun
Mar 27 13:52:43 PDT 2011 hbase(main):001:0> list listEnter TABLE
0 row(s) in 0.3950 seconds hbase(main):002:0> 50
- 51. Hadoop HBase Hadoop HBase 51
- 52. Hadoop(1/7) bin/start-all.sh Hadoop bin/stop-all.sh Hadoop
bin/hadoop version Hadoop bin/hadoop dfsadmin report HDFS
bin/hadoop namenode format bin/hadoop fs -ls HDFS bin/hadoop fs -ls
/user/root/input bin/hadoop fs -mkdir /user/root/tmp bin/hadoop fs
-put conf/* /user/root/tmp HDFS bin/hadoop fs -cat
/user/root/tmp/core-site.xml HDFS bin/hadoop fs -get
/user/root/tmp/core-site.xml /opt/hadoop/ HDFS bin/hadoop fs -rm
/user/root/tmp/core-site.xml HDFS bin/hadoop fs -rmr /user/root/tmp
HDFS 52
- 53. Hadoop(2/7) HDFSbin/hadoop fs /hadoop# bin/hadoop fs Usage:
java FsShell [-ls ] [-lsr ] [-du ] ... ... ... -files specify comma
separated files to be copied to the map reduce cluster -libjars
specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the
compute machines. The general command line syntax is bin/hadoop
command [genericOptions] [commandOptions] 53
- 54. Hadoop(3/7) MapReduce Job HadoopMapReduce Job jarHadoop
bin/hadoop jar [MapReduce Job jar] [Job] [Job]
Hadoophadoop-0.20.2-examples.jar grepwordcount pi /hadoop#
bin/hadoop jar hadoop-0.20.2-examples.jar 54
- 55. Hadoop(4/7) Hadoopjar hadoop-0.20.2-core.jarhadoop
commonhdfs mapreduce hadoop-0.20.2-test.jar Hadoop
hadoop-0.20.2-ant.jarAnt 55
- 56. Hadoop(5/7) bin/hadoop jobJob /hadoop# bin/hadoop job -list
all 5 jobs submitted States are: Running : 1 Succeded : 2 Failed :
3 Prep : 4 JobId State StartTime UserName Priority SchedulingInfo
job_201105162211_0001 2 1305555169692 root NORMAL NA
job_201105162211_0002 2 1305555869142 root NORMAL NA
job_201105162211_0003 2 1305555912626 root NORMAL NA
job_201105162211_0004 2 1305633307809 root NORMAL NA
job_201105162211_0005 2 1305633347357 root NORMAL NA 56
- 57. Hadoop(6/7) Job bin/hadoop job -status [JobID] /hadoop#
bin/hadoop job -status job_201105162211_0001 Job bin/hadoop job
-history [] bin/hadoop job -history /user/root/output Hadoop job:
job_201105162211_0007 ===================================== Job
tracker host name: Host01 job tracker start time: Mon May 16
22:11:01 CST 2011 User: root JobName: grep-sort 57
- 58. Hadoop(7/7) Jobbin/hadoop job /hadoop# bin/hadoop job
Usage: JobClient [-submit ] [-status ] [-counter ] [-kill ]
[-set-priority ]. Valid values for priorities are: VERY_HIGH HIGH
NORMAL LOW VERY_LOW ... ... ... The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions] 58
- 59. Hadoop HBase Hadoop HBase 59
- 60. HBase(1/10) HBase name student ID course : math course :
history John 1 80 85 Adam 2 75 90 bin/hbase hsellHBase /hbase#
bin/hbase shell HBase Shell; enter 'help' for list of supported
commands. Type "exit" to leave the HBase Shell Version 0.90.2,
r1085860, Sun Mar 27 13:52:43 PDT 2011 hbase(main):001:0>
60
- 61. HBase(2/10) scoresstudentidcoursecolumn > create [],
[column1], [column2], hbase(main):001:0> create 'scores',
'studentid', 'course' 0 row(s) in 1.8970 seconds listHBase
hbase(main):002:0> list TABLE scores 1 row(s) in 0.0170 seconds
61
- 62. HBase(3/10) > describe [] hbase(main):003:0> describe
'scores' DESCRIPTION ENABLED BLOCKCACHE => 'true'}]} 1 row(s) in
0.0260 seconds scoresJohnstudentid column1 > put [], [row],
[column], [] hbase(main):004:0> put 'scores', 'John',
'studentid:', '1' 0 row(s) in 0.0600 seconds 62
- 63. HBase(4/10) Johncourse:mathcolumn80 hbase(main):005:0>
put 'scores', 'John', 'course:math', '80' 0 row(s) in 0.0100
seconds Johncourse:historycolumn 85 hbase(main):006:0> put
'scores', 'John', 'course:history', '85' 0 row(s) in 0.0080 seconds
63
- 64. HBase(5/10) Adamstudentid2 course:math75course:history90
hbase(main):007:0> put 'scores', 'Adam', 'studentid:', '2' 0
row(s) in 0.0130 seconds hbase(main):008:0> put 'scores',
'Adam', 'course:math', '75' 0 row(s) in 0.0100 seconds
hbase(main):009:0> put 'scores', 'Adam', 'course:history', '90'
0 row(s) in 0.0080 seconds 64
- 65. HBase(6/10) scores > scan [] hbase(main):011:0> scan
'scores' ROW COLUMN+CELL Adam column=course:history,
timestamp=1305704304053, value=90 Adam column=course:math,
timestamp=1305704282591, value=75 Adam column=studentid:,
timestamp=1305704186916, value=2 John column=course:history,
timestamp=1305704046378, value=85 John column=course:math,
timestamp=1305703949662, value=80 John column=studentid:,
timestamp=1305703742527, value=1 2 row(s) in 0.0420 seconds 65
- 66. HBase(7/10) scoresJohn > get [], [row]
hbase(main):010:0> get 'scores', 'John' COLUMN CELL
course:history timestamp=1305704046378, value=85 course:math
timestamp=1305703949662, value=80 studentid:
timestamp=1305703742527, value=1 3 row(s) in 0.0440 seconds 66
- 67. HBase(8/10) scorescoursescolumn family > scan [],
{COLUMNS => [column family]} hbase(main):011:0> scan
'scores', {COLUMNS => 'course:'} ROW COLUMN+CELL Adam
column=course:history, timestamp=1305704304053, value=90 Adam
column=course:math, timestamp=1305704282591, value=75 John
column=course:history, timestamp=1305704046378, value=85 John
column=course:math, timestamp=1305703949662, value=80 2 row(s) in
0.0250 seconds 67
- 68. HBase(9/10) scorescolumn > scan [], {COLUMNS =>
[[column1], [column 2],]} hbase(main):012:0> scan 'scores',
{COLUMNS => ['studentid','course:']} ROW COLUMN+CELL Adam
column=course:history, timestamp=1305704304053, value=90 Adam
column=course:math, timestamp=1305704282591, value=75 Adam
column=studentid:, timestamp=1305704186916, value=2 John
column=course:history, timestamp=1305704046378, value=85 John
column=course:math, timestamp=1305703949662, value=80 John
column=studentid:, timestamp=1305703742527, value=1 2 row(s) in
0.0290 seconds 68
- 69. HBase(10/10) disabledrop hbase(main):003:0> disable
'scores' 0 row(s) in 2.1510 seconds hbase(main):004:0> drop
'scores' 0 row(s) in 1.7780 seconds 69
- 70. Hadoop HBase Hadoop HBase 70
- 71. (1/2) HadoopHadoop HDFS MapReduceJobtracker (Mozilla
Firefox)http://localhost:50070 http://localhost:50030Jobtracker
71
- 72. (2/2) HBase Master Masterhttp://localhost:60010/ Region
Server Slavehttp://localhost:60030/ ZooKeeper
Masterhttp://localhost:60010/zk.jsp 72