Infrastructure
Network Time
To keep all cluster nodes time-synchronized, MapR requires software such as a Network Time Protocol (NTP) server to be configured and running on every node. If server clocks in the cluster drift out of sync, serious problems will occur with HBase and other MapR services. MapR raises a Time Skew alarm on any out-of-sync nodes. See http://www.ntp.org/ for more information about obtaining and installing NTP.
Advanced: Installing an internal NTP server keeps your cluster synchronized even when an outside NTP server is inaccessible.
Syslog
Syslog must be enabled on each node to preserve logs regarding killed processes or failed jobs. Modern versions such as syslog-ng and rsyslog are possible, making it more difficult to be sure that a syslog daemon is present. One of the following commands should suffice:
syslogd -v
service syslog status
rsyslogd -v
service rsyslog status
Default umask
Ensure that the default umask for the root user is set to 0022 on all mapr nodes in the cluster. The umask setting is changed in the /etc/profile file, or in the .cshrc or .login file. The root user must have a 0022 umask because the MapR admin user requires access to all files and directories under the /opt/mapr directory, even those initially created by root services.
ulimit
ulimit
is a command that sets limits on the user's access to system-wide
resources. Specifically, it provides control over the resources available to the shell and to
processes started by it.
The mapr-warden script uses the ulimit
command to set the maximum number of
file descriptors (nofile
) and processes (nproc
) to 64000.
Higher values are unlikely to result in an appreciable performance gain. Lower values, such as
the default value of 1024, are likely to result in task failures.
Depending on your environment, you might want to set limits manually rather than relying on
Warden to set them automatically using ulimit
.
PAM
Nodes that will run the MapR Control System (the mapr-webserver
service) can take advantage of Pluggable Authentication Modules (PAM) if found. Configuration files in
/etc/pam.d/
directory are typically provided for each standard Linux
command. MapR can use, but does not require, its own profile.
Security - SELinux, AppArmor
SELinux (or the equivalent on other operating systems) must be disabled during the install procedure. If the MapR services run as a non-root user, SELinux can be enabled after installation and while the cluster is running.
TCP Retries
net.ipv4.tcp_retries2
to 5 so that MapR
can detect unreachable nodes with less latency.net.ipv4.tcp_syn_retries
to 4 on each node.- Edit the file
/etc/sysctl.conf
and add the following line:net.ipv4.tcp_retries2=5
- Save the file and run:
sysctl -p
NFS
Disable the stock Linux NFS server on nodes that will run the MapR NFS server.
iptables/firewalld
Enabling iptables on a node may close ports that are used by MapR. If you enable iptables, make sure that required ports remain open. Check your current IP table rules with the following command:
$ service iptables status
systemctl disable firewalld
.Transparent Huge Pages (THP)
For data-intensive workloads, MapR recommends disabling the Transparent Huge Pages (THP) feature in the Linux kernel.
RHEL example:
$ echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
Ubuntu example:
$ echo never > /sys/kernel/mm/transparent_hugepage/defrag
Automated Configuration
Some users find tools like Puppet or Chef useful to configure each node in a cluster. Make sure, however, that any configuration tool does not reset changes made when MapR packages are later installed. Specifically, do not let automated configuration tools overwrite changes to the following files:
-
/etc/sudoers
-
/etc/sysctl.conf
-
/etc/security/limits.conf
-
/etc/udev/rules.d/99-mapr-disk.rules