Drill JDBC Driver Features

The Drill JDBC driver supports both catalogs and schemas in order to work with various JDBC applications. The Drill JDBC driver follows convention, adding a synthetic catalog named DRILL under which all of the schemas/databases are organized. The driver also maps the JDBC schema to the DRILL catalog.

The Drill JDBC driver supports the following schema types:

HBase
Hive
Distributed File System (DFS), supporting the following file formats:
- Parquet
- JavaScript Object Notation (JSON)
- Comma-Separated Values (CSV)
- Tab-Separated Values (TSV)

Use of the SQL asterisk (*) selector to return all columns in a table is limited. You must modify queries to specify the columns to return. Depending on the schema type or file format, refer to the following syntax guidelines and modify your queries as needed. Note that when you write queries for Drill, you must enclose all table and schema names in back ticks (``).

NOTE: The MapR ODBC Driver for Drill provides an application called Drill Explorer that helps generate queries for Drill. However, the Drill JDBC drill currently does not support Drill Explorer. If you would like to simplify the process of formulating valid queries for Drill, consider using the Drill Explorer and the ODBC driver.

The Drill JDBC driver presents an HBase column-family as a JDBC column. An HBase column is derived using the lookup scheme ColumnFamily['Column'] where ColumnFamily is an HBase column-family and Column is an HBase column contained in ColumnFamily. You must specify individual HBase columns, not columnfamilies, so you need to have some familiarity with the data. For example, the following query may return incorrect results:

SELECT * FROM `hbase`.`students`

When the query is modified to specify individual columns, then the query results are correct:

SELECT CAST(account['name'] AS varchar(20)) FROM `hbase`.`students`

When a query is used as a subquery, such as when you use a SQL view to connect to a data source using a BI tool, you must specify column names in the query. For example, the BI tool may not connect to your Drill data source successfully when using the following query:

SELECT * FROM `dfs`.`default`.`./opt/drill/test.parquet`

When the query is modified to specify column names, the BI tool connects to your Drill data source successfully using the SQL view:

SELECT CAST(column1 AS varchar(20)) FROM `dfs`.`default`.`./opt/drill/test.parquet`

When a query is used as a subquery, such as when you use a SQL view to connect to a data source using a BI tool, you must specify column names in the query. For example, the BI tool may not connect to your Drill data source successfully when using the following query:

SELECT * FROM `dfs`.`default`.`./opt/drill/interval.json`

When the query is modified to specify column names, the BI tool connects to your Drill data source successfully using the SQL view:

SELECT column1 FROM `dfs`.`default`.`./opt/drill/interval.json`

To query nested elements, use the following syntax, where menu is a child of column1:

SELECT column1['menu'] FROM `dfs`.`default`.`./opt/drill/interval.json`

You can query elements that are multiple levels deep. Continuing the example, if menuitem is a child of menu, then use the following syntax:

SELECT column1['menu']['menuitem'] FROM `dfs`.`default`.`./opt/drill/interval.json`

You must specify columns using indices, which requires some familiarity with the data. For example, the following query may return incorrect results:

SELECT * FROM `dfs`.`default`.`./root/Test/Table.csv`

When the query is modified to specify columns using indices, the query results are correct:

SELECT columns[0], columns[2] FROM `dfs`.`default`.`./root/Test/Table.csv`

When using a Hive schema, you can use standard queries without modifying them.

Drill can work with self-describing data such as HBase and file systems without central metadata definitions. In some scenarios, the file formats do not have defined data types for the data. For example, HBase always treats data as binary. Drill provides auxiliary functions to cast (or interpret) the data as certain data types. You must explicitly cast binary data to another format to view the data. You must also enclose all table and schema names in backticks (`). The following examples show you how to cast data as a specific data type. For more information about the SQL queries in Drill, refer to the Apache Drill documentation.

The following query displays results from an HBase database in binary format:

SELECT account['name'] FROM `hbase`.`students`

The following query displays the same results in string format:

SELECT CAST(account['name'] AS varchar(20)) FROM `hbase`.`students`

The following query displays results from a Parquet file in binary format:

SELECT column1 FROM `dfs`.`default`.`./opt/drill/test.parquet`

The following query displays the same results in string format:

SELECT CAST(column1 AS varchar(20)) FROM `dfs`.`default`.`./opt/drill/test.parquet`

The Drill JDBC driversupports many common data formats, converting between Drill, SQL, and Java data types. Drill supports the following data type mappings:

Drill Type	SQL Type	Java Type
BIGINT	BIGINT	BIGINT
BIT	BOOLEAN	BOOLEAN
DATE	DATE	DATE
FLOAT4	FLOAT	REAL
FLOAT8	DOUBLE	DOUBLE
INT	INT	INTEGER
SMALLINT	SMALLINT	SMALLINT
TIMESTAMP	TIMESTAMP	TIMESTAMP
TINYINT	TINYINT	TINYINT
VARCHAR	VARCHAR	VARCHAR