Run Hive Jobs with Oozie
Complete the following steps to configure Oozie to submit Hive jobs:
- (Optional) Update the Hive shared libraries. By default, Oozie ships with
shared libraries for a specific Hive version. To update the shared libraries with the
version of Hive that you are running, complete the following steps:
- Stop
Oozie.
maprcli node services -name oozie -action stop -nodes <space delimited list of nodes>
- Remove Hive libraries from
<OOZIE_HOME>/share2/lib/hive/
.rm -rf /opt/mapr/oozie/oozie-<version>/share2/lib/hive/hive-* rm -rf /opt/mapr/oozie/oozie-<version>/share2/lib/hive/jline-*
- As of Oozie 4.2.0-1510, also remove Hive libraries from
/share1/lib/hive/
:rm -rf /opt/mapr/oozie/oozie-/share1/lib/hive/hive-* rm -rf /opt/mapr/oozie/oozie-/share1/lib/hive/jline-*
- Copy the following JAR files from
<HIVE_HOME>/lib/
to<OOZIE_HOME>/share2/lib/hive/
:hive-ant hive-cli hive-common hive-contrib hive-exec hive-metastore hive-serde hive-service hive-shims hive-shims-0.20 hive-shims-0.20S hive-shims-0.23 hive-shims-common hive-shims-common-secure
Example: cp /opt/mapr/hive/hive-<version>/lib/{hive-ant*.jar,hive-cli*.jar,hive-common*.jar,hive-contrib*.jar,hive-exec*.jar,hive-metastore*.jar,hive-serde*.jar,hive-service*.jar,hive-shims*.jar} /opt/mapr/oozie/oozie-<version>/share2/lib/hive/ cp /opt/mapr/hive/hive-<version>/lib/jline-* /opt/mapr/oozie/oozie-<version>/share2/lib/hive/
- As of the Oozie 4.2.0-1510, also copy the following jar files from
/lib/
to/share1/lib/hive/
:
Example Commandhive-ant hive-cli hive-common hive-contrib hive-exec hive-metastore hive-serde hive-service hive-shims hive-shims-0.20 hive-shims-0.20S hive-shims-0.23 hive-shims-common hive-shims-common-secure
cp /opt/mapr/hive/hive-/lib/{hive-ant*.jar,hive-cli*.jar,hive-common*.jar,hive-contrib*.jar,hive-exec*.jar,hive-metastore*.jar,hive-serde*.jar,hive-service*.jar,hive-shims*.jar} /opt/mapr/oozie/oozie-/share1/lib/hive/ cp /opt/mapr/hive/hive-/lib/jline-* /opt/mapr/oozie/oozie-/share1/lib/hive/
- Start
Oozie.
maprcli node services -name oozie -action start -nodes <space delimited list of nodes>
NOTE: If high availability is enabled for Oozie, perform steps a through e on all nodes where Oozie is installed. - As of Oozie 4.1.0-1601 and Oozie 4.2.0-1601, if the
oozie.service.WorkflowAppService.system.libpath
property in oozie-site.xml does not use the default value (/oozie/share/lib
), you must perform the following steps to update the shared libraries:- Based on the cluster MapReduce mode, run one of the following
commands to copy the new Oozie shared libraries to
MapR-FS:
Cluster MapReduce Mode Command YARN sudo -u mapr {OOZIE_HOME}/bin/oozie-setup.sh sharelib create -fs maprfs:/// -locallib /opt/mapr/oozie/oozie-<version>/share2
Classic sudo -u mapr {OOZIE_HOME}/bin/oozie-setup.sh sharelib create -fs maprfs:/// -locallib /opt/mapr/oozie/oozie-<version>/share1
- Run the following command to update the Oozie classpath with the
new shared
libraries:
sudo -u mapr {OOZIE_HOME}/bin/oozie admin -sharelibupdate
- Based on the cluster MapReduce mode, run one of the following
commands to copy the new Oozie shared libraries to
MapR-FS:
- Stop
Oozie.
- (Optional) Configure Hive to use the metastore server.
- To use a metastore server for the Hive job, add the following parameter
to the
hive-site.xml
file:<property> <name>hive.metastore.uris</name> <value>thrift://<IP address>:<port></value> <description>IP address (or fully-qualified domain name) and port of the metastore host</description> </property>
- To use a metastore server for the Hive job, add the following parameter
to the
- Configure a Hive workflow. As of Oozie 4.2.0-1508, you can configure Oozie to
perform a workflow by connecting to Hive Metastore or Hiveserver2. Previously, Oozie
could only submit jobs to Hive Metastore. Configure a Hive Workflow with Connection to Hive Metastore
- Copy the edited
hive-site.xml
file to the same location as yourworkflow.xml
file. - Edit the
workflow.xml
file to include the following:- Specify the
hive-site.xml
in thejob-xml
parameter. - Specify the name of the script (for example,
script.q
) that contains the hive query in the script parameter. - Optionally, add properties used by the Oozie launcher job.
Add the prefix
oozie.launcher
to the property names.
<workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf"> <start to="hive-node"/> <action name="hive-node"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/hive"/> <mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/> </prepare> <job-xml>hive-site.xml</job-xml> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <script>script.q</script> <param>INPUT=/user/${wf:user()}/${examplesRoot}/input-data/table</param> <param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/hive</param> </hive> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
- Specify the
Configure a Hive Workflow with Connection to HiveServer2- Copy the edited
hive-site.xml
file to the same location as yourworkflow.xml
file. - Edit the
workflow.xml
file to include the following:- Specify the JDBC URL used by Beeline for connections to
Hiveserver2 in the
jdbc-url
element. See Connecting to HiveServer2 for details. - Specify the name of the script (for example,
script.q
) that contains the hive query in the script element.<?xml version="1.0" encoding="UTF-8"?> <workflow-app xmlns="uri:oozie:workflow:0.5" name="hive2-wf"> <start to="hive2-node"/> <action name="hive2-node"> <hive2 xmlns="uri:oozie:hive2-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/${wf:user()}/output-data/hive2"/> <mkdir path="${nameNode}/user/${wf:user()}/output-data"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <jdbc-url>jdbc:hive2://localhost:10000/default</jdbc-url> <script>script.q</script> <param>INPUT=/user/${wf:user()}/input-data/table</param> <param>OUTPUT=/user/${wf:user()}/output-data/hive2</param> </hive2> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Hive2 (Beeline) action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
- Specify the JDBC URL used by Beeline for connections to
Hiveserver2 in the
- Copy the edited