Run Pig Jobs with Oozie
Complete the following steps to configure Oozie to run Pig jobs:
- (Optional) Update the Pig shared libraries. By default, Oozie ships with
shared libraries for a specific Pig version. To update the shared libraries with the
version of Pig that you are running, complete the following steps:
- Stop
Oozie:
maprcli node services -name oozie -action stop -nodes <space delimited list of nodes>
- Remove all files located within the
/opt/mapr/oozie/oozie<version>/share2/lib/pig*/
directory EXCEPT theoozie-sharelib-pig-<version>-mapr.jar
file. - As of Oozie 4.2.0-1501, also remove all files located within the
/opt/mapr/oozie/oozie/share1/lib/pig*/
directory EXCEPT theoozie-sharelib-pig--mapr.jar
file. - Copy the pig-core and pig lib into the Oozie shared libraries for
Pig:
cp <PIG_HOME>/pig-core-h2.jar <OOZIE_HOME>/share2/lib/pig/ cp <PIG_HOME>/pig-core-h2.jar <OOZIE_HOME>/share2/lib/pig-2/ cp <PIG_HOME>/lib/* <OOZIE_HOME>/share2/lib/pig/ cp <PIG_HOME>/lib/* <OOZIE_HOME>/share2/lib/pig-2/
- As of Oozie 4.2.0-1510, also copy the pig-core and pig lib into the Oozie
share1 libraries folder for Pig:
cp /pig-core-h2.jar /share1/lib/pig/ cp /pig-core-h2.jar /share1/lib/pig-2/ cp /lib/* /share1/lib/pig/ cp /lib/* /share1/lib/pig-2/
- Remove the zookeeper jars and h1
directories:
rm -rf <OOZIE_HOME>/share2/lib/pig/h1 <OOZIE_HOME>/share2/lib/pig/zookeeper*.jar rm -rf <OOZIE_HOME>/share2/lib/pig-2/h1 <OOZIE_HOME>/share2/lib/pig/zookeeper*.jar
- As of Oozie 4.2.0-1510, also remove the zookeeper jars and h1 directories
from the Oozie share1 libraries folder:
rm -rf /share1/lib/pig/h1 /share1/lib/pig/zookeeper*.jar rm -rf /share1/lib/pig-2/h1 /share1/lib/pig/zookeeper*.jar
- Start
Oozie:
maprcli node services -name oozie -action start -nodes <space delimited list of nodes>
NOTE: If high availability is enabled for Oozie, repeat steps 2 through 7 on all nodes where Oozie is installed. - As of Oozie 4.1.0-1601 and Oozie 4.2.0-1601, if the
oozie.service.WorkflowAppService.system.libpath
property in oozie-site.xml does not use the default value (/oozie/share/), you must run perform the following steps to update the shared libraries:- Based on the cluster MapReduce mode, run one of the following
commands to copy the new Oozie shared libraries to
MapR-FS:
Cluster MapReduce Mode Command YARN sudo -u mapr {OOZIE_HOME}/bin/oozie-setup.sh sharelib create -fs maprfs:/// -locallib /opt/mapr/oozie/oozie-<version>/share2
Classic sudo -u mapr {OOZIE_HOME}/bin/oozie-setup.sh sharelib create -fs maprfs:/// -locallib /opt/mapr/oozie/oozie-<version>/share1
- Run the following command to update the Oozie classpath with the
new shared
libraries:
sudo -u mapr {OOZIE_HOME}/bin/oozie admin -sharelibupdate
- Based on the cluster MapReduce mode, run one of the following
commands to copy the new Oozie shared libraries to
MapR-FS:
- Stop
Oozie:
- Configure a Pig workflow.
- Edit the
workflow.xml
file to include the following:- Specify the shared library with the
oozie.action.sharelib.for.pig
property. With MapR distribution versions 4.0.0 and later, set this property topig-2
. - Optionally, specify the name of the script (for example, id.pig)
that contains the Pig query in the
script
parameter.<workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf"> <start to="pig-node"/> <action name="pig-node"> <pig> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/${wf:user()}/output-data/pig"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>oozie.action.sharelib.for.pig</name> <value>pig-2</value> </property> <property> <name>mapred.compress.map.output</name> <value>true</value> </property> </configuration> <script>id.pig</script> <param>INPUT=/user/${wf:user()}/input-data/text</param> <param>OUTPUT=/user/${wf:user()}/output-data/pig</param> </pig> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Pig failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
- Specify the shared library with the
- Edit the