Sqoop is an Apache project that is part of the broader Hadoop
ecosphere. In many ways Sqoop is similar to distcp (See the Moving
data efficiently between clusters using Distributed Copy recipe of this
chapter). Both are built on top of MapReduce and take advantage of its
parallelism and fault tolerance
The MySQL JDBC driver JAR file has been
copied to $SQOOP_HOME/libs. The driver can be downloaded from http://dev.mysql.com/downloads/connector/j/.
How to do it...
Complete the following
steps to transfer data from a MySQL table to an HDFS file:
1. Create a new database in the MySQL instance:
CREATE DATABASE logs;
2. Create and load the weblogs table:
USE logs;
CREATE TABLE weblogs(
md5 VARCHAR(32),
url VARCHAR(64),
request_date DATE,
request_time TIME,
ip VARCHAR(15)
);
LOAD DATA INFILE
'/path/weblog_entries.txt' INTO TABLE weblogs
FIELDS
TERMINATED BY '\t' LINES TERMINATED BY '\r\n';
3. Select a count of rows from the weblogs
table:
mysql> select count(*) from
weblogs;
The output would be:
+----------+
| count(*) |
+----------+
| 3000 |
+----------+
1 row in set (0.01 sec)
4. Import the data from MySQL to HDFS:
sqoop import -m 1 --connect
jdbc:mysql://<HOST>:<PORT>/logs --username hdp_usr --password test1
--table weblogs --target-dir / data/weblogs/import
The output would be:
INFO orm.CompilationManager:
Writing jar file:
/tmp/sqoop-jon/compile/f57ad8b208643698f3d01954eedb2e4d/weblogs.
jar
WARN
manager.MySQLManager: It looks like you are importing from mysql.
How it works...
Sqoop loads the JDBC
driver defined in the --connect statement from $SQOOP_HOME/libs, where $SQOOP_HOME is the full path to
the location where Sqoop is installed. The --username and --password options are used to
authenticate the user issuing the command against the MySQL instance. The mysql.user table must have an
entry for the --username option and the host of
each node in the Hadoop cluster; or else Sqoop will throw an exception indicating
that the host is not allowed to connect to the MySQL Server.
mysql> USE mysql;
mysql> select host, user
from user;
The output would be:
+------------+-----------+
| user | host |
+------------+-----------+
| hdp_usr | hdp01 |
| hdp_usr
| hdp02 |
No comments:
Post a Comment