Configure Hive Directories
You can configure the following Hive directories:
- Hive Scratch Directory
- Hive Warehouse Directory
- Hive Error Logs Directory
Hive Scratch Directory
In Hive 1.0, MapR configures the Hive scratch directory to be
/user/<user.name>/tmp/hive/<user.name>
. In Hive 0.13,
the default scratch directory is /user/<user.name>/tmp/hive. The hive user must
have write access to the /user folder.
To modify this parameter, perform one of the following operations:
- Set this parameter in
the
hive-site.xml.
Copy the hive.exec.scratchdir property elements from the $HIVE_HOME/conf/hive-default.xml.template file and paste them into an XML configuration element in the $HIVE_HOME/conf/hive-site.xml file. Then, modify the value elements for these directories in the hive-site.xml file. -
Set this parameter from the Hive shell. Example:
hive> set hive.exec.scratchdir=/myvolume/tmp
How Hive Handles Scratch Directories on MapR
When a query requires Hive to query existing tables and create data for new tables, Hive uses the following workflow:
- Create the query scratch directory
hive_<timestamp>_<randomnumber>
under the Hive scratch directory. - Create the following directories as subdirectories of the scratch directory:
- Final query output directory. This directory's name takes the form
-ext-<number>
. -
An output directory for each MapReduce job. These directories' names take the form
-mr-<number>
.
- Final query output directory. This directory's name takes the form
- Hive executes the tasks, including MapReduce jobs and loading data to the query output directory.
- Hive loads the data from output directory into a table. By default, the table's
directory is in the
/user/hive/warehouse
directory. You can configure this location with thehive.metastore.warehouse.dir
parameter inhive-site.xml
, unless the table DDL specifies a custom location. Hive renames the output directory to the table directory in order to load the output data to the table. - The scratch directories are automatically deleted after the query completes successfully.
MapR uses volumes, which are logical units that enable you to apply
policies to a set of files, directories, and sub-volumes. When the output directory and
the table directory are in different volumes, this workflow involves moving data across
volumes. Moving data across volumes is slower than moving data within a volume.
Therefore, MapR sets hive.optimize.insert.dest.volume
to
true
to automatically create a scratch directory in the same volume
as the target table.
Hive Warehouse Directory
Hive tables are stored in the Hive warehouse directory. By
default, MapR configures the Hive warehouse directory to
be /user/hive/warehouse
under the root
volume. This default is defined in
the $HIVE_HOME/conf/hive-default.xml.template
file.
To modify this parameter, perform one of the following operations:
- Set this parameter in
the
hive-site.xml.
Copy the hive.metastore.warehouse.dir property elements from the $HIVE_HOME/conf/hive-default.xml.template file and paste them into an XML configuration element in the $HIVE_HOME/conf/hive-site.xml file. Then, modify the value elements for these directories in the hive-site.xml file. - Set this parameter from the Hive shell.
Example:
hive> set hive.metastore.warehouse.dir=/myvolume/mydirectory
Hive Error Logs Directory
The log files are stored in
/opt/mapr/hive/hive-<version>/logs/<user>
by default.
To modify the log location:
- Configure
hive.log.dir
in$HIVE_HOME/conf/hive-log4j.properties
file. Example:hive.log.dir=<other_location>
- Set the sticky bit on the new directory. Example:
chmod 1777 <other_location>