Simplistic CLI tool for RDBMS and Hadoop transfers
Pros
- Provides generalized JDBC extensions to migrate data between most database systems
- Generates Java classes upon reading database records for use in other code utilizing Hadoop's client libraries
- Allows for both import and export features
Cons
- Sqoop2 development seems to have stalled. I have set it up outside of a Cloudera CDH installation, and I actually prefer it's "Sqoop Server" model better than just the CLI client version that is Sqoop1. This works especially well in a microservices environment, where there would be only one place to maintain the JDBC drivers to use for Sqoop.
Return on Investment
- When combined with Cloudera's HUE, it can enable non-technical users to easily import relational data into Hadoop.
- Being able to manipulate large datasets in Hadoop, and them load them into a type of "materialized view" in an external database system has yielded great insights into the Hadoop datalake without continuously running large batch jobs.
- Sqoop isn't very user-friendly for those uncomfortable with a CLI.
Alternatives Considered
Apache Kafka and Apache Spark
Other Software Used
Apache Kafka, Apache Flume, Apache Spark, Apache Pig, Qubole, Amazon Elastic MapReduce, Cloudera Manager, Amazon Relational Database Service, Amazon S3 (Simple Storage Service), Logstash, Elasticsearch, Splunk Enterprise

