Sqoop is an Apache project that is part of the broader Hadoop
ecosphere. In many ways Sqoop is similar to distcp
Complete the following
steps to transfer data from HDFS to a MySQL table:
1. Create a new database in the MySQL
instance:
CREATE
DATABASE logs;
2. Create the weblogs_from_hdfs table:
USE logs;
CREATE TABLE weblogs_from_hdfs
(
md5 VARCHAR(32),
url VARCHAR(64),
request_date DATE,
request_time TIME,
ip VARCHAR(15)
);
3. Export the weblog_entries.txt file
from HDFS to MySQL:
sqoop export -m 1 --connect
jdbc:mysql://<HOST>:<PORT>/logs --username hdp_usr --password test1
--table weblogs_from_hdfs --export-dir /data/weblogs/05102012
--input-fields-terminated-by '\t' --mysql-delmiters
The output is as follows:
INFO
mapreduce.ExportJobBase: Beginning export of weblogs_from_ hdfs
Sqoop loads the JDBC driver defined in the --connect statement from $SQOOP_HOME/ libs,
where $SQOOP_HOME
is the full path to the location where
Sqoop is installed. The --username and --password options are used to authenticate the user issuing the command
against the MySQL instance.
mysql> USE mysql;
mysql> select host, user
from user;
+---------------+-----------+
| user | host |
+---------------+-----------+
| hdp_usr | hdp01 |
| hdp_usr
| hdp02 |
For example, the
export job can be thought of as reading each line of the weblogs_entries.txt file
in HDFS and producing the following output:
INSERT INTO weblogs_from_hdfs
VALUES('aabba15edcd0c8042a14bf216c5',
'/jcwbtvnkkujo.html', '2012-05- 10', '21:25:44', '148.113.13.214');
INSERT INTO weblogs_from_hdfs
VALUES('e7d3f242f111c1b522137481d8508ab7',
'/ckyhatbpxu.html', '2012- 05-10', '21:11:20', '4.175.198.160');
By default, Sqoop export creates
INSERT statements.
If the --update-key argument
is specified, UPDATE statements
will be created instead. If the preceding example had used the argument --update-key md5,
the generated code would have run like the following:
UPDATE weblogs_from_hdfs SET
url='/jcwbtvnkkujo.html', request_ date='2012-05-10'request_time='21:25:44'
ip='148.113.13.214'WHERE
md5='aabba15edcd0c8042a14bf216c5'
UPDATE
weblogs_from_hdfs SET url='/jcwbtvnkkujo.html', request_
date='2012-05-10'request_time='21:11:20' ip='4.175.198.160' WHERE md5='e7d3f242f111c1b522137481d8508ab7'
No comments:
Post a Comment