Logging Options on YARN

For YARN applications, there are various logging options to choose from based on the MapR version and the types of applications that you run. In 4.0.2 and later versions, you have the following logging options:

For MapReduce v2 applications, the default logging option is to log files on the local file system. However, central logging and YARN log aggregation are also available.
For non-MapReduce applications, the default logging option is to log files on the local file system. However,YARN log aggregation is also available.

Centralized Logging for MapReduce v2

Centralized logging provides an application-centric view of all the log files generated by NodeManager nodes throughout the cluster. It enables users to gain a complete picture of application execution by having all the logs available in a single directory, without having to navigate from node to node.

The MapReduce program generates three types of log output:

Standard output stream: captured in the stdout file
Standard error stream: captured in the stderr file
Log4j logs: captured in the syslog file

Centralized logs are available cluster-wide as they are written to the following local volume on the MapR-FS: /

var/mapr/local/<NodeManager node>/logs/yarn/userlogs

Since the log files are stored in a local volume directory that is associated with each NodeManager node, you run the maprcli job linklogs command to create symbolic links for all the logs in a single directory. You can then use tools such as grep and awk to analyze them from an NFS mount point. You can also view the entire set of logs for a particular application using the HistoryServer UI.

YARN Log Aggregation

The YARN log aggregation option aggregates logs from the local file system and moves log files for completed applications from the local file system to the MapR-FS. This allows users to view the entire set of logs for a particular application using the HistoryServer UI or by running the yarn logs command.