mapr copytable

Copies data from one MapR-DB JSON table to another MapR-DB JSON table.

If the destination table does not exist, mapr copytable creates the destination table with the same metadata (column families and access control expressions) as the source table, and then copies data.

If the destination table exists, mapr copytable copies data only.

Required Permissions

The user that runs mapr copytable must have the following permissions, which you can grant with access-control expressions:
  • The permission readAce on the volume where the source table is located, and the permission writeAce on the volume where the destination table is or will be located.
  • The permission adminperm on the source table.
  • The permission for column-family and column reads (readperm) on the data in the source table that you want to copy.
  • When bulkload = false, the permission for column writes (writeperm) on the destination table.
  • When bulkload = true (default), the permission to load the destination table with bulk loads (bulkloadperm).
  • If the destination table does not yet exist: createrenamefamily on the source table.

For information about how to set permissions on volumes, see Setting/Modifying Whole Volume ACEs.

For information about how to set permissions on tables, see Enabling Table Authorizations with Access Control Expressions.

NOTE: The mapr user is not treated as a superuser. MapR-DB does not allow the mapr user to run this utility unless that user is given the relevant permission or permissions with access-control expressions.

Syntax

mapr copytable 
-src <source table path>
-dst <destination table path>
[-fromID <start key>]
[-toID <end key>]
[-bulkload <true|false> (default: true)]
[-mapreduce <true|false> (default: true)]
[-cmpmeta <true|false> (default: true)]
[-numthreads <number of threads> (default: 16)

Parameters

Parameter Description
src The path of the table that you want to copy from.
dst The path of the table that you want to copy to.
fromID The value of the _id field in the first document of the range of documents to copy. Use this value only if you are not copying the entire set of documents in a table. If you use this parameter, you must also use the -stopRow parameter.
toID The value of the _id field in the last document of the range of documents to copy. Use this parameter together with the -startRow parameter.
bulkoad A Boolean value that specifies whether or not to perform a full bulk load of the table. The default is to use bulk loading (true). After a bulk load, you must set the -bulkload parameter of the table to false by running the command maprcli table edit -path <path to table> -bulkload false.
mapreduce

A Boolean value that specifies whether or not to use a MapReduce program to perform the copying operation. The default, preferred method is to use a MapReduce program (true).

When this parameter is set to false, a client process uses multiple threads to read rows of the source table and write rows to the destination table.

The MapReduce program runs as MapReduce v1 job or MapReduce v2 application based on the MapReduce mode that is configured on this node. For more information, see Managing the MapReduce Mode.

cmpmeta A Boolean value that specifies whether or not to compare table metadata such as column families and ACEs. The default is to compare metadata (true). Such comparisons are done when the destination table exists before mapr copytable is run and checks that the user ID that runs mapr copytable has the proper permissions on the destination table.

Set the value of this parameter to false before copying a table that contains a single column family to a table that contains two or more column families.

numthreads When -mapreduce is false, this parameter specifies the number of threads allocated to perform the copying of data. The default is 16. If additional CPU resources are available, you might want to increase the number of threads to achieve better performance.

Example

[user@hostname ~]$ mapr copytable -src /user1/tableA -dst 
/mapr/clusterB/vol1/tableB -startRow user000001 -stopRow user009999

Monitoring mapr copytable Operations

Use one of the following methods to monitor the progress of the copying of table data:
  • If the copy table operation runs as a MapReduce v1 job, monitor the job using the JobTracker UI.
  • If the copy table operation runs as a MapReduce v2 application, monitor the application using the ResourceManager UI.
  • If the copy table operation runs as a client process, go to the Tables view of the destination table in the MapR Control System. Then, on the Region tab, monitor the pace at which the number of rows increases.