Security Architecture

A secure MapR cluster provides the following specific security elements:
  • Communication between the nodes in the cluster is encrypted:
    • HBase traffic is secured with Kerberos.
    • NFS traffic between the server and cluster, traffic within the MapR-FS, and CLDB traffic is encrypted with secure MapR RPCs.
    • Traffic between JobClients, TaskTrackers, and JobTrackers is secured with MAPRSASL, an implementation of the Simple Authentication and Security Layer framework.
  • Support for Kerberos user authentication.
  • Support for Kerberos encryption for secure communication to open source components that require it.
  • Support for the Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO) used with the web UI front ends of some cluster components.

Authentication Architecture: The maprlogin Utility

Explicit User Authentication
When you explicitly generate a ticket, you have the option to authenticate with your username and password or authenticate with Kerberos:
  1. The user invokes the maprlogin utility, which connects to a CLDB node in the cluster using HTTPS. The hostname for the CLDB node is specified in the mapr-clusters.conf file.
    1. When using username/password authentication, the node authenticates using PAM modules with the Java Authentication and Authorization Service (JAAS). The JAAS configuration is specified in the mapr.login.conf file. The system can use any registry that has a PAM module available.
    2. When using Kerberos to authenticate, the CLDB node verifies the Kerberos principal with the keytab file.
  2. After authenticating, the CLDB node uses the standard UNIX APIs getpwnam_r and getgrouplist, which are controlled by the /etc/nsswitch.conf file, to determine the user's user ID and group ID.
  3. The CLDB node generates a ticket and returns it to the client machine.
  4. The server validates that the ticket is properly encrypted, to verify that the ticket was issued by the cluster's CLDB.
  5. The server also verifies that the ticket has not expired or been blacklisted.
  6. The server checks the ticket for the presence of a privileged identity such as the mapr user. Privileged identities have impersonation functionality enabled.
  7. The ticket's user and group information are used for authorization to the cluster, unless impersonation is in effect.
Implicit Authentication with Kerberos
On clusters that use Kerberos for authentication, a MapR ticket is implicitly obtained for a user that that runs a MapR command without first using the maprlogin utility. The implicit authentication flow for the maprlogin utility first checks for a valid ticket for the user, and uses that ticket if it exists. If a ticket does not exist, the maprlogin utility checks if Kerberos is enabled for the cluster, then checks for an existing valid Kerberos identity. When the maprlogin utility finds a valid Kerberos identity, it generates a ticket for that Kerberos identity.

Authorization Architecture: ACLs and ACEs

An Access Control List (ACL) is a list of users or groups. Each user or group in the list is paired with a defined set of permissions that limit the actions that the user or group can perform on the object secured by the ACL. In MapR, the objects secured by ACLs are the job queue, volumes, and the cluster itself.

A job queue ACL controls who can submit jobs to a queue, kill jobs, or modify their priority. A volume-level ACL controls which users and groups have access to that volume, and what actions they may perform, such as mirroring the volume, altering the volume properties, dumping or backing up the volume, or deleting the volume.

An Access Control Expression (ACE) is a combination of user, group, and role definitions. A role is a property of a user or group that defines a set of behaviors that the user or group performs regularly. You can use roles to implement your own custom authorization rules. ACEs are used to secure MapR tables that use native storage.

Encryption Architecture: Wire-Level Security

MapR uses a mix of approaches to secure the core work of the cluster and the Hadoop components installed on the cluster.

Nodes in a MapR cluster use different protocols depending on their tasks:
  • The FileServer, JobTracker, and TaskTracker use MapR tickets to secure their remote procedure calls (RPCs) with the native MapR security layer. Clients can use the maprlogin utility to obtain MapR tickets. Web UI elements of these components use password security by default, but can also be configured to use SPNEGO.
  • HiveServer2, Flume, and Oozie use MapR tickets by default, but can be configured to use Kerberos.
  • HBase and the Hive metaserver require Kerberos for secure communications.
  • The MCS Web UI is secured with passwords. The MCS Web UI does not support SPNEGO for users, but supports both password and SPNEGO security for REST calls.

Servers must use matching security approaches. When an Oozie server, which supports MapR Tickets and Kerberos, connects to HBase, which supports only Kerberos, Oozie must use Kerberos for outbound security. When servers have both MapR and Kerberos credentials, these credentials must map to the same User ID to prevent ambiguity problems.

Security Protocols Used by MapR

Protocol Encryption Authentication
MapR RPC AES/GCM maprticket
Hadoop RPC and MAPRSASL MAPRSASL maprticket
Hadoop RPC and Kerberos Kerberos Kerberos ticket
Generic HTTP Handler HTTPS using SSL/TLS maprticket, username and password, or Kerberos SPNEGO

Security Protocols Listed by Component

Component Protocols Used
CLDB Outbound: MapR RPC

Inbound: Custom HTTP handler for the maprlogin utility, which supports authentication through username and password or Kerberos.

MapR-FS MapR RPC
Task and Job Trackers Hadoop RPC and MAPRSASL. Traffic to the MapR file system uses MapR RPC.
HBase Inbound: Hadoop RPC and Kerberos

Outbound: Hadoop RPC and Kerberos. Traffic to the MapR file system uses MapR RPC.

Oozie Inbound: Generic HTTP Handler by default, configurable for HTTPS using SSL/TLS

Outbound: Hadoop RPC and MAPRSASL by default, configurable to replace MAPRSASL with Kerberos. Traffic to the MapR file system uses MapR RPC.

NFS Inbound: Unencrypted NFS protocol

Outbound: MapR RPC

Flume Inbound: None

Outbound: Hadoop RPC and MAPRSASL by default, configurable to replace MAPRSASL with Kerberos. Traffic to the MapR file system uses MapR RPC.

HiveServer2 Inbound: Thrift and Kerberos, or username/password over SSL.

Outbound: Hadoop RPC and MAPRSASL by default, configurable to replace MAPRSASL with Kerberos. Traffic to the MapR file system uses MapR RPC.

Hive Metaserver Inbound: Hadoop RPC and Kerberos. Traffic to the MapR file system uses MapR RPC.
MCS Inbound: User traffic is secured with HTTPS using SSL/TLS and username/password. REST traffic is secured with HTTPS using SSL/TLS with username/password and SPNEGO. Web UIs Generic HTTP handler. Single sign-on (SSO) is supported by shared cookies.