2023.06.08 (THU) ํ์ต์ ๋ฆฌ
#sqoop
1. Sqoop ๊ฐ๋
sqoop์ RDBMS์ HDFS ์ฌ์ด์์ ๋ฐ์ดํฐ ์ ์ก์ ์ง์ํ๋ ํด๋ก MapReduce๋ฐฉ์์ผ๋ก ๋์
๊ธฐ๋ณธ์ ์ผ๋ก text fileํ์์ผ๋ก HDFS์ ์ ์ฌ๋์ง๋ง, ์ถ๊ฐ ์ค์ ์ผ๋ก sequence file, avro, parquet์ ๋ํ ํ์์ ์ง์
- HDFS ์ ์ฅ์ ๊ธฐ์ค, import(RDBMS -> HDFS), export(HDFS -> RDBMS)
2. sqoop ์ค์น
Sqoop Documentation (v1.4.7)
Sqoop Documentation (v1.4.7) Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this f
sqoop.apache.org
- ์ค์นํ์ผ ๋ค์ด๋ก๋ ๋ฐ ์์ถ ํด์
# ์ค์นํ์ผ ๋ค์ด๋ก๋ ๋ฐ ์์ถ ํด์
wget https://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
tar -xvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
# ํ์ผ์ HADOOP ํด๋๋ก ์ด๋
mv sqoop-1.4.7.bin__hadoop-2.6.0 $HADOOP_HOME/
cd $HADOOP_HOME/sqoop-1.4.7.bin__hadoop-2.6.0
- ํ๊ฒฝ๋ณ์ ์ค์
vi ~/.zshrc
# ~/.zshrc์ ์๋ ๋ด์ฉ ์์ฑ
export SQOOP_HOME=$HADOOP_HOME/sqoop-1.4.7.bin__hadoop-2.6.0
export SQOOP_CONF_DIR=$SQOOP_HOME/conf
export PATH=$PATH:$SQOOP_HOME/bin
# ์ฌ์คํ
source ~/.zshrc
- sqoop-env.sh ์ค์ ์ถ๊ฐ
cd $SQOOP_CONF_DIR
cp sqoop-env-template.sh sqoop-env.sh
vi sqoop-env.sh
# sqoop-env.sh ์ ์๋ ๋ด์ฉ ์์ฑ
# hadoop ๊ด๋ จ ์ค์
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HIVE_HOME=$HIVE_HOME
- mysql-connector ๋ค์ด๋ก๋
#mysql version ํ์ธ
mysql -V
์์์ ํ์ธํ mysql ๋ฒ์ ๊ณผ ๋ง๋ ์ค์นํ์ผ์ ์ฐพ์ ๋ค์ด๋ก๋ ๐ mysql-connector-download
cd ~/tmp #์์ ํด๋๋ก ์ด๋ ํ ํด๋น ํด๋์ local์์ ์ค์น๋ ํ์ผ upload
mv /mnt/c/๋ก์ปฌ๊ฒฝ๋ก/Downloads/mysql-connector-j-8.0.33.tar.gz .
tar xvf mysql-connector-j-8.0.33.tar.gz #์์ถ ํด์
cd mysql-connector-j-8.0.33
cp mysql-connector-j-8.0.33.jar $SQOOP_HOME/lib #sqoop ํด๋๋ก ์ด๋
- apache-commons-lang ๋ค์ด๋ก๋
cd ~/tmp #์์ ํด๋๋ก ์ด๋ ํ ํด๋น ํด๋์ local์์ ์ค์น๋ ํ์ผ upload
mv /mnt/c/๋ก์ปฌ๊ฒฝ๋ก/Downloads/commons-lang-2.6-bin.tar.gz .
tar xvf commons-lang-2.6-bin.tar.gz #์์ถ ํด์
cd commons-lang-2.6
cp commons-lang-2.6.jar $SQOOP_HOME/lib #sqoop/lib ํด๋๋ก ์ด๋
3. Sqoop ์ค์ต
- ๋ฐ์ดํฐ๋ฒ ์ด์ค ๋ชฉ๋ก ๋ณด๊ธฐ
# list-databases
$ sqoop list-databases --connect jdbc:mysql://localhost:[mysqlํฌํธ๋ฒํธ] --username [hive username] --password [hive pw]
information_schema
performance_schema
metastore
- ๋ฐ์ดํฐ๋ฒ ์ด์ค ํ ์ด๋ธ ๊ฐ์ ธ์ค๊ธฐ
# import RDBMS -> HDFS # metastore db์์ COLUMNS_V2 ํ
์ด๋ธ ๊ฐ์ ธ์ค๊ธฐ
$ sqoop import --connect jdbc:mysql://localhost:[mysqlํฌํธ๋ฒํธ]/metastore --table COLUMNS_V2 --target-dir [hdfs ๊ฒฝ๋ก] --username [hive username] --P -m 1
$ hdfs dfs -ls [hdfs ๊ฒฝ๋ก]
Found 2 items
-rw-r--r-- 1 tom supergroup 0 2023-06-08 10:50 [hdfs ๊ฒฝ๋ก]/_SUCCESS
-rw-r--r-- 1 tom supergroup 3966 2023-06-08 10:50 [hdfs ๊ฒฝ๋ก]/part-m-00000
4. ๊ธฐํ ์ฐธ๊ณ ์ฌํญ
4-1. mysql port ๋ฒํธ ํ์ธํ๊ธฐ
์ผ๋ฐ์ ์ผ๋ก port ๋ฒํธ๋ 3306์ด๋ ํด๋น ๋ฒํธ๊ฐ ์๋ ๊ฒฝ์ฐ, ์๋๋ฐฉ์์ผ๋ก ํ์ธ ๊ฐ๋ฅ
$ mysql -u root -p # mysql ์ ์
$ mysql > SHOW GLOBAL VARIABLES LIKE 'PORT';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| port | 3306 |
+---------------+-------+
1 row in set (0.01 sec)
'๐ Data > Engineering' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
| [Airflow] HiveServer2Hook, HiveCliHook ์ฌ์ฉํ์ฌ Hive์ ์ ๊ทผํ๊ธฐ (0) | 2023.06.12 |
|---|---|
| [Airflow] Airflow์ Python Operator / Hive Operator ์ฌ์ฉํ๊ธฐ (0) | 2023.06.09 |
| [Spark] Spark, Zeppelin Notebook ์ฌ์ฉํ๊ธฐ (0) | 2023.06.05 |
| [HIVE] Airflow / Hive๋ฅผ ์ด์ฉํ ๋ฐ์ดํฐ ์ฒ๋ฆฌ (0) | 2023.06.02 |
| [Hive] Hive ์ค์น ๋ฐ ๊ฐ์ (0) | 2023.05.31 |