Drill JDBC Driver Features
The Drill JDBC driver provides the following features:
- Catalog and schema support
- The Drill JDBC driver supports both catalogs and schemas in order to work with various JDBC applications. The Drill JDBC driver follows convention, adding a synthetic catalog named DRILL under which all of the schemas/databases are organized. The driver also maps the JDBC schema to the DRILL catalog.
- Schema types and file formats
- The Drill JDBC driver supports the following schema types:
- HBase
- Hive
- Distributed File System (DFS), supporting the following file formats:
- Parquet
- JavaScript Object Notation (JSON)
- Comma-Separated Values (CSV)
- Tab-Separated Values (TSV)
- Specifying column names
- Use of the SQL asterisk (*) selector to return all columns in a table is limited. You
must modify queries to specify the columns to return. Depending on the schema type or file
format, refer to the following syntax guidelines and modify your queries as needed. Note
that when you write queries for Drill, you must enclose all table and schema names in back
ticks (``). NOTE: The MapR ODBC Driver for Drill provides an application called Drill Explorer that helps generate queries for Drill. However, the Drill JDBC drill currently does not support Drill Explorer. If you would like to simplify the process of formulating valid queries for Drill, consider using the Drill Explorer and the ODBC driver.
- HBase
- The Drill JDBC driver presents an HBase column-family as a JDBC column. An HBase column
is derived using the lookup scheme ColumnFamily['Column'] where ColumnFamily
is an HBase column-family and Column is an HBase column contained in
ColumnFamily. You must specify individual HBase columns, not columnfamilies, so
you need to have some familiarity with the data. For example, the following query may
return incorrect results:
When the query is modified to specify individual columns, then the query results are correct:SELECT * FROM `hbase`.`students`
SELECT CAST(account['name'] AS varchar(20)) FROM `hbase`.`students`
- Parquet
- When a query is used as a subquery, such as when you use a SQL view to connect to a data
source using a BI tool, you must specify column names in the query. For example, the BI
tool may not connect to your Drill data source successfully when using the following
query:
When the query is modified to specify column names, the BI tool connects to your Drill data source successfully using the SQL view:SELECT * FROM `dfs`.`default`.`./opt/drill/test.parquet`
SELECT CAST(column1 AS varchar(20)) FROM `dfs`.`default`.`./opt/drill/test.parquet`
- JSON
- When a query is used as a subquery, such as when you use a SQL view to connect to a data
source using a BI tool, you must specify column names in the query. For example, the BI
tool may not connect to your Drill data source successfully when using the following
query:
When the query is modified to specify column names, the BI tool connects to your Drill data source successfully using the SQL view:SELECT * FROM `dfs`.`default`.`./opt/drill/interval.json`
To query nested elements, use the following syntax, where menu is a child of column1:SELECT column1 FROM `dfs`.`default`.`./opt/drill/interval.json`
You can query elements that are multiple levels deep. Continuing the example, if menuitem is a child of menu, then use the following syntax:SELECT column1['menu'] FROM `dfs`.`default`.`./opt/drill/interval.json`
SELECT column1['menu']['menuitem'] FROM `dfs`.`default`.`./opt/drill/interval.json`
- CSV/TSV
- You must specify columns using indices, which requires some familiarity with the data. For example,
the following query may return incorrect results:
When the query is modified to specify columns using indices, the query results are correct:SELECT * FROM `dfs`.`default`.`./root/Test/Table.csv`
SELECT columns[0], columns[2] FROM `dfs`.`default`.`./root/Test/Table.csv`
- Hive
- When using a Hive schema, you can use standard queries without modifying them.
- Casting binary data
- Drill can work with self-describing data such as HBase and file systems without central
metadata definitions. In some scenarios, the file formats do not have defined data types
for the data. For example, HBase always treats data as binary. Drill provides auxiliary
functions to cast (or interpret) the data as certain data types. You must explicitly cast
binary data to another format to view the data. You must also enclose all table and schema
names in backticks (`). The following examples show you how to cast data as a specific
data type. For more information about the SQL queries in Drill, refer to the Apache Drill
documentation.
The following query displays results from an HBase database in binary format:
SELECT account['name'] FROM `hbase`.`students`
The following query displays the same results in string format:
SELECT CAST(account['name'] AS varchar(20)) FROM `hbase`.`students`
The following query displays results from a Parquet file in binary format:
SELECT column1 FROM `dfs`.`default`.`./opt/drill/test.parquet`
The following query displays the same results in string format:
SELECT CAST(column1 AS varchar(20)) FROM `dfs`.`default`.`./opt/drill/test.parquet`
- Data types
- The Drill JDBC driversupports many common data formats, converting between Drill, SQL, and
Java data types. Drill supports the following data type mappings:
Drill Type SQL Type Java Type BIGINT BIGINT BIGINT BIT BOOLEAN BOOLEAN DATE DATE DATE FLOAT4 FLOAT REAL FLOAT8 DOUBLE DOUBLE INT INT INTEGER SMALLINT SMALLINT SMALLINT TIMESTAMP TIMESTAMP TIMESTAMP TINYINT TINYINT TINYINT VARCHAR VARCHAR VARCHAR