8
Apache Sqoop 陳陳陳

Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS

Embed Size (px)

Citation preview

Page 1: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS

Apache Sqoop

陳威宇

Page 2: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS

Sqoop : RDB 與 Hadoop 的橋樑

• Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores.

• 從 .. 拿資料– RDBMS– Data warehources– NoSQL

• 寫資料到 ..– Hive– Hbase

• 使用 mapreduce framework to transfer data in parallel

2figure Source : http://bigdataanalyticsnews.com/data-transfer-mysql-cassandra-using-sqoop/

Page 3: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS

Sqoop 使用方法

3figure Source : http://hive.3du.me/slide.html

Page 4: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS

Sqoop 與大象的連結 ( setup )

• 解壓縮 http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.5-cdh5.3.2.tar.gz

• 修改~/.bashrc

• 修改 conf/sqoop-env.sh

• 啟動 sqoop

export JAVA_HOME=/usr/lib/jvm/java-7-oracleexport HADOOP_HOME=/home/hadoop/hadoopexport HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport HIVE_HOME=/home/hadoop/hiveexport SQOOP_HOME=/home/hadoop/sqoopexport HCAT_HOME=${HIVE_HOME}/hcatalog/ export PATH=$PATH:$SQOOP_HOME/bin:

$ sqoopTry 'sqoop help' for usage.

export HADOOP_COMMON_HOME=/home/hadoop/hadoopexport HBASE_HOME=/home/hadoop/hbaseexport HIVE_HOME=/home/hadoop/hive

Page 5: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS

練習一 : 實作 import to hive

cd ~ git clone https://github.com/waue0920/hadoop_example.git cd hadoop_example/sqoop/ex1 mysql -u root -phadoop < ./exc1.sql hadoop fs -rmr /user/hadoop/authors sqoop import --connect jdbc:mysql://localhost/books

--username root --table authors --password hadoop --hive-import -m 1

練習 : 用 hive 語法查詢是否已經匯入hive> select * from authors;

Page 6: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS

練習一 : 製作 job

hadoop fs -rmr /user/hadoop/authors sqoop job --create myjob1 -- import --connect

jdbc:mysql://localhost/books --username root -table authors -P -hive-import -m 1

sqoop job --list sqoop job --show myjob sqoop job --exec myjob

練習 : 用 hive 語法查詢是否已經匯入hive> select * from authors;

Page 7: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS

練習二 : 實作 export to mysql

cd ~/hadoop_example/sqoop/ex2 mysql -u root -phadoop < ./create.sql ./update_hdfs_data.sh sqoop export --connect jdbc:mysql://localhost/db

--username root --password hadoop --table employee --export-dir /user/hadoop/sqoop_input/emp_data