Run Pig Jobs with Oozie

Complete the following steps to configure Oozie to run Pig jobs:
  1. (Optional) Update the Pig shared libraries. By default, Oozie ships with shared libraries for a specific Pig version. To update the shared libraries with the version of Pig that you are running, complete the following steps:
    1. Stop Oozie:
      maprcli node services -name oozie -action stop -nodes <space delimited list of nodes>
    2. Remove all files located within the /opt/mapr/oozie/oozie<version>/share2/lib/pig*/ directory EXCEPT the oozie-sharelib-pig-<version>-mapr.jar file.
    3. As of Oozie 4.2.0-1501, also remove all files located within the /opt/mapr/oozie/oozie/share1/lib/pig*/ directory EXCEPT the oozie-sharelib-pig--mapr.jar file.
    4. Copy the pig-core and pig lib into the Oozie shared libraries for Pig:
      cp <PIG_HOME>/pig-core-h2.jar <OOZIE_HOME>/share2/lib/pig/
      cp <PIG_HOME>/pig-core-h2.jar <OOZIE_HOME>/share2/lib/pig-2/
      cp <PIG_HOME>/lib/* <OOZIE_HOME>/share2/lib/pig/
      cp <PIG_HOME>/lib/* <OOZIE_HOME>/share2/lib/pig-2/ 
    5. As of Oozie 4.2.0-1510, also copy the pig-core and pig lib into the Oozie share1 libraries folder for Pig:
      cp /pig-core-h2.jar /share1/lib/pig/
      cp /pig-core-h2.jar /share1/lib/pig-2/
      cp /lib/* /share1/lib/pig/
      cp /lib/* /share1/lib/pig-2/
    6. Remove the zookeeper jars and h1 directories:
      rm -rf <OOZIE_HOME>/share2/lib/pig/h1 <OOZIE_HOME>/share2/lib/pig/zookeeper*.jar 
      rm -rf <OOZIE_HOME>/share2/lib/pig-2/h1 <OOZIE_HOME>/share2/lib/pig/zookeeper*.jar
    7. As of Oozie 4.2.0-1510, also remove the zookeeper jars and h1 directories from the Oozie share1 libraries folder:
      rm -rf /share1/lib/pig/h1 /share1/lib/pig/zookeeper*.jar
      rm -rf /share1/lib/pig-2/h1
      /share1/lib/pig/zookeeper*.jar
    8. Start Oozie:
      maprcli node services -name oozie -action start -nodes <space delimited list of nodes>
      NOTE: If high availability is enabled for Oozie, repeat steps 2 through 7 on all nodes where Oozie is installed.
    9. As of Oozie 4.1.0-1601 and Oozie 4.2.0-1601, if the oozie.service.WorkflowAppService.system.libpath property in oozie-site.xml does not use the default value (/oozie/share/), you must run perform the following steps to update the shared libraries:
      1. Based on the cluster MapReduce mode, run one of the following commands to copy the new Oozie shared libraries to MapR-FS:
        Cluster MapReduce Mode Command
        YARN
        sudo -u mapr {OOZIE_HOME}/bin/oozie-setup.sh sharelib create -fs maprfs:/// -locallib /opt/mapr/oozie/oozie-<version>/share2
        Classic
        sudo -u mapr {OOZIE_HOME}/bin/oozie-setup.sh sharelib create -fs maprfs:/// -locallib /opt/mapr/oozie/oozie-<version>/share1
      2. Run the following command to update the Oozie classpath with the new shared libraries:
        sudo -u mapr {OOZIE_HOME}/bin/oozie admin -sharelibupdate
  2. Configure a Pig workflow.
    1. Edit the workflow.xml file to include the following:
      1. Specify the shared library with the oozie.action.sharelib.for.pig property. With MapR distribution versions 4.0.0 and later, set this property to pig-2.
      2. Optionally, specify the name of the script (for example, id.pig) that contains the Pig query in the script parameter.
        <workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">
            <start to="pig-node"/>
            <action name="pig-node">
                <pig>
                    <job-tracker>${jobTracker}</job-tracker>
                    <name-node>${nameNode}</name-node>
                    <prepare>
                        <delete path="${nameNode}/user/${wf:user()}/output-data/pig"/>
                    </prepare>
                    <configuration>
                        <property>
                            <name>mapred.job.queue.name</name>
                            <value>${queueName}</value>
                        </property>
                        <property>
                            <name>oozie.action.sharelib.for.pig</name>
                            <value>pig-2</value>
                        </property>
                        <property>
                            <name>mapred.compress.map.output</name>
                            <value>true</value>
                        </property>
                    </configuration>
                    <script>id.pig</script>
                    <param>INPUT=/user/${wf:user()}/input-data/text</param>
                    <param>OUTPUT=/user/${wf:user()}/output-data/pig</param>
                </pig>
                <ok to="end"/>
                <error to="fail"/>
            </action>
            <kill name="fail">
                <message>Pig failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
            </kill>
            <end name="end"/>
        </workflow-app>