72
雲雲雲雲雲雲 Hadoop 設設設設設 設設設設 : 設設設 設設設設設設設

Hadoop 設定與配置

  • Upload
    -

  • View
    262

  • Download
    0

Embed Size (px)

Citation preview

  1. 1. Hadoop :
  2. 2. Outline Hadoop HBase Hadoop HBase 2
  3. 3. Hadoop HBase Hadoop HBase 3
  4. 4. (1/5) HadoopGNU/LinuxWin32 GNU/Linux HadoopJavassh HadoopJavaJava(JRE) Java() 4
  5. 5. ubuntu 14.04 OpenJDKJava java -version ~# java -version java version "1.7.0_45" OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-i386) OpenJDK Client VM (build 24.0-b16, mixed mode) OpenJDK yum : ( sudo apt-get install ) ~# yum -y install java-1.7.0_45-openjdk ~#sudo apt-get install java-1.7.0_45-openjdk (2/5) 5
  6. 6. HadoopOpenJDKOpenJDK Java Oracle (Sun) Java JDKOracle (Sun) Java JDK Oracle(http://www.oracle.com) Hadoop 2.7.0 Java jdk-1.6.0-openjdk (3/5) 6
  7. 7. jdk-8u45-linux-i586.bin/usr ~# chmod +x jdk-8u45-linux-i586.bin ~# ./jdk-8u45-linux-i586.bin /usr( )jdk1.7.0_45 alternativesOracle (Sun) Java JDKOpenJDK ~# alternatives --install /usr/bin/java java /usr/jdk1.7.0_45/bin/java 20000 or 16888 ~# alternatives --install /usr/bin/javac javac /usr/jdk1.7.0_45/bin/javac 20000 or 16888 (4/5) 7
  8. 8. Java ~# java version java version "1.7.0_45" Java(TM) SE Runtime Environment (build 1.7.0_45-b06) Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing) ~#javac -version Javac 1.7.0_45 sshrsync ~#sudo apt-get install openssh rsync or ~# yum -y install openssh rsync ~# /etc/init.d/sshd restart Hadoop root (5/5) 8
  9. 9. Hadoop HBase Hadoop HBase 9
  10. 10. Hadoop Hadoop Local (Standalone) Mode Pseudo-Distributed Mode Fully-Distributed Mode 10
  11. 11. Local (Standalone) Mode(1/7) HadoopApache Hadoop (http://hadoop.apache.org/) HadoopHadoop 2.6.0Hadoop 2.7.0 Hadoop 2.6.0wget hadoop-2.6.0.tar.gz or rpm ~# wget http://apache.cs.pu.edu.tw//hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz ~# tar zxvf hadoop-2.6.0.tar.gz 11
  12. 12. Local (Standalone) Mode(2/7) hadoop-0.20.2/opt hadoop ~# mv hadoop-0.20.2 /opt/hadoop JavaHadoophadoop viconf/hadoop-env.sh ~# cd /opt/hadoop/ /hadoop# vi conf/hadoop-env.sh 12
  13. 13. Local (Standalone) Mode(3/7) hadoop-env.shJAVA_HOME(export JAVA_HOME=/usr/jdk1.7.0_45) IPv6 IPv6hadoop-env.shexport HADOOP_OPTS=- Djava.net.preferIPv4Stack=trueIPv4 # Command specific options appended to HADOOP_OPTS when specified ... ... ... export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS" export JAVA_HOME=/usr/jdk1.7.0_45 JAVA_HOME export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true IPv4 13
  14. 14. Local (Standalone) Mode(4/7) Hadoop Local (Standalone) Mode Hadoop conf/hadoop-env.shJAVA_HOME /hadoop# bin/hadoop Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem ... ... ... or CLASSNAME run the class named CLASSNAME Most commands print help when invoked w/o parameters. 14
  15. 15. Local (Standalone) Mode(5/7) Hadoop hadoop-2.6.0-examples.jargrep input conf/xmlinput /hadoop# mkdir input /hadoop# cp conf/*.xml input 15
  16. 16. Local (Standalone) Mode(6/7) hadoop-2.6.0-examples.jar grepconfig /hadoop# bin/hadoop jar hadoop-2.6.0-examples.jar grep input output 'config[a-z.]+' /hadoop# cat output/* 13 configuration 4 configuration.xsl 1 configure 16
  17. 17. Local (Standalone) Mode(7/7) hadoop-2.6.0-examples.jar grep output output /hadoop# rm -rf output 17
  18. 18. Pseudo-Distributed Mode(1/9) Pseudo-Distributed ModeLocal (Standalone) Mode confcore- site.xmlhdfs-site.xmlmapred-site.xml core-site.xml /hadoop# vi conf/core-site.xml 18
  19. 19. Pseudo-Distributed Mode(2/9) core-site.xml fs.default.namehdfs://localhost:9000 19
  20. 20. Pseudo-Distributed Mode(3/9) hdfs-site.xml /hadoop# vi conf/hdfs-site.xml hdfs-site.xml dfs.replication1 20
  21. 21. Pseudo-Distributed Mode(4/9) mapred-site.xml /hadoop# vi conf/mapred-site.xml mapred-site.xml mapred.job.trackerlocalhost:9001 21
  22. 22. Pseudo-Distributed Mode(5/9) SSH Hadoopssh ssh ( yesEnter ) ~# ssh localhost The authenticity of host 'localhost (127.0.0.1)' can't be established. RSA key fingerprint is Are you sure you want to continue connecting (yes/no)? yes yes Warning: Permanently added 'localhost' (RSA) to the list of known hosts. root@localhost's password: 22
  23. 23. Pseudo-Distributed Mode(6/9) Ctrl + C ~# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P "" ~# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys exit ~# ssh localhost Last login: Mon May 16 10:04:39 2011 from localhost ~# exit 23
  24. 24. Pseudo-Distributed Mode(7/9) HadoopHadoop bin/hadoop namenode -format HDFS /hadoop# bin/hadoop namenode -format 11/05/16 10:20:27 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode ... ... ... 11/05/16 10:20:28 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1 ************************************************************/ 24
  25. 25. Pseudo-Distributed Mode(8/9) bin/start-all.sh JobtrackerTasktracker /hadoop# bin/start-all.sh starting namenode, logging to /opt/hadoop/bin/../logs/hadoop-root-namenode-Host01.out localhost: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-root-datanode- Host01.out localhost: starting secondarynamenode, logging to /opt/hadoop/bin/../logs/hadoop-root- secondarynamenode-Host01.out starting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-root-jobtracker-Host01.out localhost: starting tasktracker, logging to /opt/hadoop/bin/../logs/hadoop-root-tasktracker- Host01.out 25
  26. 26. Pseudo-Distributed Mode(9/9) hadoop-2.6.0-examples.jar grep bin/hadoop fs -putconf HDFSinputhadoop-2.6.0- examples.jar grep /hadoop# bin/hadoop fs -put conf input /hadoop# bin/hadoop jar hadoop-2.6.0-examples.jar grep input output 'config[a-z.]+' 26
  27. 27. Fully-Distributed Mode(1/14) Hadoop MasterSlaveJavassh IP Host01 Namenode + Jobtracker 192.168.1.1 Host02 Datanode + Tasktracker 192.168.1.2 27
  28. 28. Fully-Distributed Mode(2/14) stop-all.shHadoop Hadoop.ssh /hadoop# /opt/hadoop/bin/stop-all.sh ~# rm -rf /opt/hadoop ~# rm -rf ~/.ssh ~# rm -rf /tmp/* HadoopHost01 Host01Hadoop 2.6.0 Hadoop/opt/hadoop ~# wget http://apache.cs.pu.edu.tw//hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz ~# tar zxvf hadoop-2.6.0.tar.gz ~# mv hadoop-2.6.0 /opt/Hadoop or /usr ~# mv hadoop-2.6.0 /usr/Hadoop 28
  29. 29. Fully-Distributed Mode(3/14) bin/hadoop-env.shHadoop /opt/hadoopvibin/hadoop-env.sh ~# cd /opt/hadoop/ /hadoop# vi conf/hadoop-env.sh hadoop-env.shJAVA_HOME # Command specific options appended to HADOOP_OPTS when specified export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRAC KER_OPTS" export JAVA_HOME=/usr/jdk1.7.0_45 JAVA_HOME 29
  30. 30. Fully-Distributed Mode(4/14) conf/core-site.xmlvi /hadoop# vi conf/core-site.xml conf/core-site.xml 30
  31. 31. Fully-Distributed Mode(5/14) fs.default.namehdfs://Host01:9000hadoop.tmp.dir/var/hadoop/hadoop-${user.name} 31
  32. 32. Fully-Distributed Mode(6/14) conf/hdfs-site.xmlviconf/hdfs- site.xml /hadoop# vi conf/hdfs-site.xml conf/hdfs-site.xml dfs.replication2 32
  33. 33. Fully-Distributed Mode(7/14) conf/mapred-site.xmlvi conf/mapred-site.xml /hadoop# vi conf/mapred-site.xml conf/mapred-site.xml mapred.job.trackerHost01:9001 33
  34. 34. Fully-Distributed Mode(8/14) conf/mastersviconf/masters /hadoop# vi conf/mapred-site.xml conf/slavesviconf/slaves conf/slaveslocalhostHost02 /hadoop# vi conf/mapred-site.xml 34
  35. 35. Fully-Distributed Mode(9/14) scp ~# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P "" ~# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys ~# scp -r ~/.ssh Host02:~/ ~# ssh Host02 Host01Host02 ~# ssh Host01 Host02Host01 ~# exit Host01 ~# exit Host02 (Host01) 35
  36. 36. Fully-Distributed Mode(10/14) Hadoop HadoopSlave (NFS) Host01HadoopHost02 ~# scp -r /opt/hadoop Host02:/opt/ HadoopHDFS /hadoop# bin/hadoop namenode -format 36
  37. 37. Fully-Distributed Mode(11/14) 11/05/16 21:52:13 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = Host01/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 ************************************************************/ 11/05/16 21:52:13 INFO namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel ... ... 11/05/16 21:52:13 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at Host01/127.0.0.1 ************************************************************/ 37
  38. 38. Fully-Distributed Mode(12/14) Hadoop /hadoop# bin/start-all.sh starting namenode, logging to /opt/hadoop/bin/../logs/hadoop-root-namenode-Host01.out Host02: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-root-datanode-Host02.out starting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-root-jobtracker-Host01.out Host02: starting tasktracker, logging to /opt/hadoop/bin/../logs/hadoop-root-tasktracker- Host02.out 38
  39. 39. Fully-Distributed Mode(13/14) Fully-Distributed Mode bin/hadoop dfsadmin -reportHDFS HDFS /hadoop# bin/hadoop dfsadmin -report Configured Capacity: 9231007744 (8.6 GB) ... ... Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 1 (1 total, 0 dead) ... ... DFS Remaining%: 41.88% Last contact: Mon May 16 22:15:03 CST 2011 39
  40. 40. Fully-Distributed Mode(14/14) hadoop-0.20.2-examples.jar grep HDFSinputHadoop conf/hadoop-0.20.2- examples.jar grep /hadoop# bin/hadoop fs -mkdir input /hadoop# bin/hadoop fs -put conf/* input/ /hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'config[a-z.]+' /hadoop# bin/hadoop fs -cat output/part-00000 19 configuration 6 configuration.xsl 1 configure 40
  41. 41. Hadoop HBase Hadoop HBase 41
  42. 42. HBase(1/9) HBase Hadoop HBase2.60ZooKeeper NTP 42
  43. 43. HBase(2/9) HBase HBase(http://hbase.apache.org/)HBase hbase-2.x0.2.tar.gz /opt/hbaseHBase ~# wget http://apache.cs.pu.edu.tw//hbase/hbase-0.90.2/hbase-0.90.2.tar.gz/hadoop~# tar zxvf hbase-2.x0.2.tar.gz ~# mv hbase-2.x0.2 /opt/hbase ~# cd /opt/hbase/ viconf/hbase-env.sh /hbase# vi conf/hbase-env.sh 43
  44. 44. HBase(3/9) conf/hbase-env.sh export JAVA_HOME=/usr/jdk1.6.0_25/ export HBASE_MANAGES_ZK=true export HBASE_LOG_DIR=/tmp/hadoop/hbase-logs export HBASE_PID_DIR=/tmp/hadoop/hbase-pids conf/hbase-site.xmlHBase /hbase# vi conf/hbase-site.xml 44
  45. 45. HBase(4/9) conf/hbase-site.xml hbase.rootdirhdfs://Host01:9000/hbasehbase.cluster.distributedtruehbase.zookeeper.property.clientPort< /name> 2222hbase.zookeeper.quorumHost01,Host02hbase.zookeeper.property.dataDirame> /tmp/hadoop/hbase-data 45
  46. 46. HBase(5/9) hbase.tmp.dir/var/hadoop/hbase-${user.name}lue> hbase.master Host01:60000 46
  47. 47. HBase(6/9) viconf/regionservers /hbase# vi conf/regionservers Slaveslaves Slave Host02 HadoopHBaseconf/ /hbase# cp /opt/hadoop/conf/core-site.xml conf/ /hbase# cp /opt/hadoop/conf/mapred-site.xml conf/ /hbase# cp /opt/hadoop/conf/hdfs-site.xml conf/ 47
  48. 48. HBase(7/9) hbaselib/hadoop-core-0.20-append- r1056497.jarhadoophadoop- 0.20.2-core.jarhbaselib /hbase# rm lib/hadoop-core-0.20-append-r1056497.jar /hbase# cp /opt/hadoop/hadoop-0.20.2-core.jar ./lib/ hbaseSlave /hbase# scp -r /opt/hbase Host02:/opt/hbase 48
  49. 49. HBase(8/9) HBase /hbase# bin/start-hbase.sh Host02: starting zookeeper, logging to /tmp/hadoop/hbase-logs/hbase-root-zookeeper-Host02.out Host01: starting zookeeper, logging to /tmp/hadoop/hbase-logs/hbase-root-zookeeper-Host01.out starting master, logging to /tmp/hadoop/hbase-logs/hbase-root-master-Host01.out Host02: starting regionserver, logging to /tmp/hadoop/hbase-logs/hbase-root-regionserver-Host02.out 49
  50. 50. HBase(9/9) hbase shellHBaselist HBsae /hbase# bin/hbase shell HBase Shell; enter 'help' for list of supported commands. Type "exit" to leave the HBase Shell Version 0.90.2, r1085860, Sun Mar 27 13:52:43 PDT 2011 hbase(main):001:0> list listEnter TABLE 0 row(s) in 0.3950 seconds hbase(main):002:0> 50
  51. 51. Hadoop HBase Hadoop HBase 51
  52. 52. Hadoop(1/7) bin/start-all.sh Hadoop bin/stop-all.sh Hadoop bin/hadoop version Hadoop bin/hadoop dfsadmin report HDFS bin/hadoop namenode format bin/hadoop fs -ls HDFS bin/hadoop fs -ls /user/root/input bin/hadoop fs -mkdir /user/root/tmp bin/hadoop fs -put conf/* /user/root/tmp HDFS bin/hadoop fs -cat /user/root/tmp/core-site.xml HDFS bin/hadoop fs -get /user/root/tmp/core-site.xml /opt/hadoop/ HDFS bin/hadoop fs -rm /user/root/tmp/core-site.xml HDFS bin/hadoop fs -rmr /user/root/tmp HDFS 52
  53. 53. Hadoop(2/7) HDFSbin/hadoop fs /hadoop# bin/hadoop fs Usage: java FsShell [-ls ] [-lsr ] [-du ] ... ... ... -files specify comma separated files to be copied to the map reduce cluster -libjars specify comma separated jar files to include in the classpath. -archives specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] 53
  54. 54. Hadoop(3/7) MapReduce Job HadoopMapReduce Job jarHadoop bin/hadoop jar [MapReduce Job jar] [Job] [Job] Hadoophadoop-0.20.2-examples.jar grepwordcount pi /hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar 54
  55. 55. Hadoop(4/7) Hadoopjar hadoop-0.20.2-core.jarhadoop commonhdfs mapreduce hadoop-0.20.2-test.jar Hadoop hadoop-0.20.2-ant.jarAnt 55
  56. 56. Hadoop(5/7) bin/hadoop jobJob /hadoop# bin/hadoop job -list all 5 jobs submitted States are: Running : 1 Succeded : 2 Failed : 3 Prep : 4 JobId State StartTime UserName Priority SchedulingInfo job_201105162211_0001 2 1305555169692 root NORMAL NA job_201105162211_0002 2 1305555869142 root NORMAL NA job_201105162211_0003 2 1305555912626 root NORMAL NA job_201105162211_0004 2 1305633307809 root NORMAL NA job_201105162211_0005 2 1305633347357 root NORMAL NA 56
  57. 57. Hadoop(6/7) Job bin/hadoop job -status [JobID] /hadoop# bin/hadoop job -status job_201105162211_0001 Job bin/hadoop job -history [] bin/hadoop job -history /user/root/output Hadoop job: job_201105162211_0007 ===================================== Job tracker host name: Host01 job tracker start time: Mon May 16 22:11:01 CST 2011 User: root JobName: grep-sort 57
  58. 58. Hadoop(7/7) Jobbin/hadoop job /hadoop# bin/hadoop job Usage: JobClient [-submit ] [-status ] [-counter ] [-kill ] [-set-priority ]. Valid values for priorities are: VERY_HIGH HIGH NORMAL LOW VERY_LOW ... ... ... The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] 58
  59. 59. Hadoop HBase Hadoop HBase 59
  60. 60. HBase(1/10) HBase name student ID course : math course : history John 1 80 85 Adam 2 75 90 bin/hbase hsellHBase /hbase# bin/hbase shell HBase Shell; enter 'help' for list of supported commands. Type "exit" to leave the HBase Shell Version 0.90.2, r1085860, Sun Mar 27 13:52:43 PDT 2011 hbase(main):001:0> 60
  61. 61. HBase(2/10) scoresstudentidcoursecolumn > create [], [column1], [column2], hbase(main):001:0> create 'scores', 'studentid', 'course' 0 row(s) in 1.8970 seconds listHBase hbase(main):002:0> list TABLE scores 1 row(s) in 0.0170 seconds 61
  62. 62. HBase(3/10) > describe [] hbase(main):003:0> describe 'scores' DESCRIPTION ENABLED BLOCKCACHE => 'true'}]} 1 row(s) in 0.0260 seconds scoresJohnstudentid column1 > put [], [row], [column], [] hbase(main):004:0> put 'scores', 'John', 'studentid:', '1' 0 row(s) in 0.0600 seconds 62
  63. 63. HBase(4/10) Johncourse:mathcolumn80 hbase(main):005:0> put 'scores', 'John', 'course:math', '80' 0 row(s) in 0.0100 seconds Johncourse:historycolumn 85 hbase(main):006:0> put 'scores', 'John', 'course:history', '85' 0 row(s) in 0.0080 seconds 63
  64. 64. HBase(5/10) Adamstudentid2 course:math75course:history90 hbase(main):007:0> put 'scores', 'Adam', 'studentid:', '2' 0 row(s) in 0.0130 seconds hbase(main):008:0> put 'scores', 'Adam', 'course:math', '75' 0 row(s) in 0.0100 seconds hbase(main):009:0> put 'scores', 'Adam', 'course:history', '90' 0 row(s) in 0.0080 seconds 64
  65. 65. HBase(6/10) scores > scan [] hbase(main):011:0> scan 'scores' ROW COLUMN+CELL Adam column=course:history, timestamp=1305704304053, value=90 Adam column=course:math, timestamp=1305704282591, value=75 Adam column=studentid:, timestamp=1305704186916, value=2 John column=course:history, timestamp=1305704046378, value=85 John column=course:math, timestamp=1305703949662, value=80 John column=studentid:, timestamp=1305703742527, value=1 2 row(s) in 0.0420 seconds 65
  66. 66. HBase(7/10) scoresJohn > get [], [row] hbase(main):010:0> get 'scores', 'John' COLUMN CELL course:history timestamp=1305704046378, value=85 course:math timestamp=1305703949662, value=80 studentid: timestamp=1305703742527, value=1 3 row(s) in 0.0440 seconds 66
  67. 67. HBase(8/10) scorescoursescolumn family > scan [], {COLUMNS => [column family]} hbase(main):011:0> scan 'scores', {COLUMNS => 'course:'} ROW COLUMN+CELL Adam column=course:history, timestamp=1305704304053, value=90 Adam column=course:math, timestamp=1305704282591, value=75 John column=course:history, timestamp=1305704046378, value=85 John column=course:math, timestamp=1305703949662, value=80 2 row(s) in 0.0250 seconds 67
  68. 68. HBase(9/10) scorescolumn > scan [], {COLUMNS => [[column1], [column 2],]} hbase(main):012:0> scan 'scores', {COLUMNS => ['studentid','course:']} ROW COLUMN+CELL Adam column=course:history, timestamp=1305704304053, value=90 Adam column=course:math, timestamp=1305704282591, value=75 Adam column=studentid:, timestamp=1305704186916, value=2 John column=course:history, timestamp=1305704046378, value=85 John column=course:math, timestamp=1305703949662, value=80 John column=studentid:, timestamp=1305703742527, value=1 2 row(s) in 0.0290 seconds 68
  69. 69. HBase(10/10) disabledrop hbase(main):003:0> disable 'scores' 0 row(s) in 2.1510 seconds hbase(main):004:0> drop 'scores' 0 row(s) in 1.7780 seconds 69
  70. 70. Hadoop HBase Hadoop HBase 70
  71. 71. (1/2) HadoopHadoop HDFS MapReduceJobtracker (Mozilla Firefox)http://localhost:50070 http://localhost:50030Jobtracker 71
  72. 72. (2/2) HBase Master Masterhttp://localhost:60010/ Region Server Slavehttp://localhost:60030/ ZooKeeper Masterhttp://localhost:60010/zk.jsp 72