Apache Sqoop

Sqoop definition

Sqoop logoApache Sqoop is a command line tool developed to transfer large volumes of data from databases to relate to Hadoop, hence its name that comes from the merger of SQL and Hadoop. Specifically transforms data relating to Hive or Hbase in one direction and the other from HDFS to relational data such as MySQL, Oracle, Postgress or a data warehouse.

The transfer process consists of reading row by row each table in the database and importing them to HDFS, the output of these is a set of files that can be in CSV, Avro, binary or sequence format.

Sqoop2 has recently been created to improve the usability of Sqoop, providing a Web application and greater integration with Oozie.

 

Features

  • Provides a Java API to perform ingest processing. Allowing to program applications that perform some type of transformation on the data.
  • Provides commands for listing tables and schemas.
  • Supports incremental data loads.
  • Provides many connectors such as FTP, JDBC, Kafka, Kite, SFTP

 

Example of how to pass data between MySQL and HDFS using Sqoop

Prerequisites

Have installed Hortonworks (Virtual Box) for Spark

Creating a MySQL database table

A database is created with a table on which to perform the tests, the following commands will be used.

accessing MYSQL

$ mysql
You can make an ERROR 1045 (28000): Access denied for user ‘ root ‘ @ ‘ localhost, which is resolved:
mysql -u root-P

Note: Keep in mind that the MySQL and system wash can be different.

Consult database

mysql> show databases;

Create Database

mysql> create database myddbb;

Use database

mysql> use myddbb;

Create base table

mysql> CREATE TABLE mytable (
         id MEDIUMINT NOT NULL AUTO_INCREMENT,
         name CHAR (30) NOT NULL,
         age INTEGER (30),
         salary INTEGER (30),
         PRIMARY KEY (id));

CREATE TABLE 2

mysql> CREATE TABLE mytable2 (
         id MEDIUMINT NOT NULL AUTO_INCREMENT,
         name CHAR (30) NOT NULL,
         age INTEGER (30),
         salary INTEGER (30),
         PRIMARY KEY (id));

Insert Data

mysql> INSERT INTO mytable (name, age, salary) values
        ("Peter", 24, 21000), ("Maria", 26, 24000), ("John", 28, 25000), ("Louis", 35, 28000), ("Monica", 42, 30000), ("Rose", 43, 25000), ("Susana", 45, 39000);

Note: If you have permission problems enter MySQL and give all permissions:

grant all privileges on *.* to 'root'@'localhost' IDENTIFIED BY 'MySQL_Key' WITH GRANT OPTION;

Exit MYSQL

mysql> exit;

 

Load MySQL data to HDFS

Example of loading data from the table “MyTable” of the Database “MIBBDD” to the folder HDFs name “Mitabla_hdfs”

MySQL to HDFs

$ sqoop Import \
 --connect jdbc:mysql://localhost/myddbb \
 --username = root -P \
 --table = mytable \
 --driver = com.mysql.jdbc.driver \
 --target-dir =/my_hdfs_table \
 --fields-terminated-by = ',' \
 --lines-terminated-by '\n'

Target-dir: File HDFS where it is stored.
Table: Identifies the table to be copied.
Clear-Staging-table: Indicates that past data can be deleted.
VERBOSE: Prints additional information to facilitate debugging.
Fields-terminated-by: defining the delimiter.

 

Loading data from HDFS to MySQL

Example of loading data from the HDFS folder named “my_hdfs_table” to the “mytable2” table in the “myddbb” database.

HDFs to MySQL

$ sqoop export 
 --connect jdbc: mysql://localhost/myddbb 
 --username = root -P 
 --table = mytable2 
 --export-dir =/my_hdfs_table -m 1

Note: If you have problems “set $ ACCUMULO_HOME to the root in your ACCUMULO intallation”, it can be avoided with:

$ ACCUMULO_HOME = '/var/lib/accumulo'
$ Export ACCUMULO_HOME
$ sudo mkdir/var/lib/accumulo

To practice with Sqoop then consult: “Examples of Sqoop”, in this section compiled many examples that can serve you useful.

Source: official Shell documentation

Source: official documentation for JAVA API