GitHub - Wzhipeng/Hive-JDBC-Storage-Handler: Hive Storage Handler for JDBC

#Hive Storage Handler for JDBC#

The Hive Storage Handler For JDBC by Qubole, which is a fork of HiveJdbcStorageHandler, helps users read from and write to JDBC databases using Hive, and also enabling them to run SQL queries to analyze data that resides in JDBC tables. Optimizations such as FilterPushDown have also been added.

##Building from Source##

Download the code from Github:

  $ git clone https://github.com/qubole/Hive-JDBC-storage-Handler.git
  $ cd Hive-JDBC-storage-Handler

Build using Maven (add -DskipTests to build without running tests):

  $ mvn clean install -Phadoop-1

The JARs for the storage handler can be found in the target/ folder. Use qubole-hive-JDBC-0.0.4.jar in the hive session (see below).

##Usage##

Add the JAR to the Hive session. <path-to-jar> is the path to the above mentioned JAR. For using this with Qubole hive, upload the JAR to an S3 bucket and provide its path.

  ADD JAR <path-to-jar>;

Each record in the JDBC corresponds to a row in the Hive table.
While creating the Hive table, use

  STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'

##Table Creation## Table can be created in 2 ways:

First is, Column mappings can be explicitly given along with the table creation statement.

DROP TABLE HiveTable;
CREATE EXTERNAL TABLE HiveTable(
  id INT,
  id_double DOUBLE,
  names STRING,
  test INT
)
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
  "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
  "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
  "mapred.jdbc.username"="root",
  "mapred.jdbc.input.table.name"="JDBCTable",
  "mapred.jdbc.output.table.name"="JDBCTable",
  "mapred.jdbc.password"="",
  "mapred.jdbc.hive.lazy.split"= "false"
);

Second is, no table mappings are specified, SerDe class automatically generates those mappings.

CREATE EXTERNAL TABLE HiveTable
row format serde 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcSerDe'
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
  "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
  "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
  "mapred.jdbc.username"="root",
  "mapred.jdbc.input.table.name"="JDBCTable",
  "mapred.jdbc.output.table.name" = "JDBCTable",
  "mapred.jdbc.password"="",
  "mapred.jdbc.hive.lazy.split"= "false"
);

NOTE: "mapred.jdbc.hive.lazy.split"= "true" property enables 
       split computation to be done by mappers internally.

##Sample Queries##

HIVE-JDBC Storage Handeler supports alomost all types of possible SQL queries. Some examples of supported queries are:

####Queries to Read from DB ####

* select * from HiveTable;
* Select count(*) from HiveTable;
* select id from HiveTable where id > 50000;
* select names from HiveTable;
* select * from HiveTable where names like ‘D%’;
* SELECT * FROM HiveTable ORDER BY id DESC;

####Group By Queries ####

* select id, sum(id_double) as sum_double from HiveTable group by id;

Join Queries

* select HiveTable_1.*, HiveTable_2.* from HiveTable_1 a join HiveTable_2 b 
  on (a.id = b.id) where a.id > 90000 and b.id > 97000 ;

Queries to insert data into DB

* Insert Into Table HiveTable_1 select * from HiveTable_2;
*Insert Into Table HiveTable_1 select * from HiveTable_2 
 where id > 50000 and upper(names) = 'ANN';

Support For FilterPushDown

Support for FilterPushDown has been added to the jar as described in the following [wiki] (https://cwiki.apache.org/confluence/display/Hive/FilterPushdownDev)
To disable FilterPushDown

 set hive.optimize.ppd = false;

##Contributions##

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Release-Notes.txt		Release-Notes.txt
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Join Queries

Queries to insert data into DB

Support For FilterPushDown

About

Releases

Packages

Languages

License

Wzhipeng/Hive-JDBC-Storage-Handler

Folders and files

Latest commit

History

Repository files navigation

Join Queries

Queries to insert data into DB

Support For FilterPushDown

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages