Bulk Loading and MapR-DB Binary Tables

The most common way of loading data to a MapR-DB binary table is with a put operation. At large scales, however, bulk loads offer a performance advantage over put operations.

A bulk load can be performed as a full bulk load or as an incremental bulk load.

Bulk loading is supported for the following tools, which can be used for both full and incremental bulk load operations:

  • MapR CopyTable Utility.Info: This utility is different from Apache HBase's CopyTable utility. When copying data to MapR-DB binary tables, it is recommended to use the MapR-DB version, which copies table metadata, access control expressions, and more in addition to table data.

  • The ImportFiles tool, which imports HFile or Result files into a MapR-DB binary table.hbase com.mapr.fs.hbase.tools.mapreduce.ImportFiles -Dmapred.reduce.tasks=2 -inputDir /test/tabler.kv -table /table2 -format ResultIf you are running on an HBase 0.98 client but the exported files were generated with HBase 0.94, include -Dhbase.import.version=0.94 in the ImportFiles job.

Full Bulk Loads

Full bulk loads offer the best performance advantage for empty binary tables. A full bulk load operation can only be performed to an empty table and skips the write-ahead log (WAL) typical of Apache HBase and MapR-DB binary-table operations, resulting in increased performance.

Note: You can perform a full bulk load only on empty tables that have the bulk load attribute set to true. You can set this value only when creating a table.

Tables are unavailable for normal client operations, including put, get, and scan operations, while a full bulk load operation is in progress.

Incremental Bulk Loads

Incremental bulk loads can add data to existing tables concurrently with other table operations, with better performance than put operations. This type of bulk load makes use of write-ahead log files.

Tables are available for client operations, such as put, get, and scan operations, during incremental bulk loads.

You can use incremental bulk loads to ingest large amounts of data to an existing table. Tables remain available for standard client operations such as put, get, and scan while the bulk load is in process. A table can perform multiple incremental bulk load operations simultaneously.

Whether you create a table with the maprcli table create command, with the hbase shell’s create command, or in MCS, incremental loads are supported by default.