Saturday, 28 December 2013

Importing data from MySQL into HDFS using Sqoop

Sqoop is an Apache project that is part of the broader Hadoop ecosphere. In many ways Sqoop is similar to distcp (See the Moving data efficiently between clusters using Distributed Copy recipe of this chapter). Both are built on top of MapReduce and take advantage of its parallelism and fault tolerance
The MySQL JDBC driver JAR file has been copied to $SQOOP_HOME/libs. The driver can be downloaded from http://dev.mysql.com/downloads/connector/j/.
How to do it...
Complete the following steps to transfer data from a MySQL table to an HDFS file:
1. Create a new database in the MySQL instance:
CREATE DATABASE logs;
2. Create and load the weblogs table:
USE logs;
CREATE TABLE weblogs(
md5 VARCHAR(32),
url VARCHAR(64),
request_date DATE,
request_time TIME,
ip VARCHAR(15)
);
LOAD DATA INFILE '/path/weblog_entries.txt' INTO TABLE weblogs
FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\r\n';
3. Select a count of rows from the weblogs table:
mysql> select count(*) from weblogs;
The output would be:
+----------+
| count(*) |
+----------+
| 3000 |
+----------+
1 row in set (0.01 sec)
4. Import the data from MySQL to HDFS:
sqoop import -m 1 --connect jdbc:mysql://<HOST>:<PORT>/logs --username hdp_usr --password test1 --table weblogs --target-dir / data/weblogs/import
The output would be:
INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-jon/compile/f57ad8b208643698f3d01954eedb2e4d/weblogs. jar
WARN manager.MySQLManager: It looks like you are importing from mysql.
How it works...
Sqoop loads the JDBC driver defined in the --connect statement from $SQOOP_HOME/libs, where $SQOOP_HOME is the full path to the location where Sqoop is installed. The --username and --password options are used to authenticate the user issuing the command against the MySQL instance. The mysql.user table must have an entry for the --username option and the host of each node in the Hadoop cluster; or else Sqoop will throw an exception indicating that the host is not allowed to connect to the MySQL Server.
mysql> USE mysql;
mysql> select host, user from user;
The output would be:
+------------+-----------+
| user | host |
+------------+-----------+
| hdp_usr | hdp01 |

| hdp_usr | hdp02 |

No comments:

Post a Comment