Configuring MapR Gateways for Binary-Table Replication, Stream Replication, or Indexing

Configuring gateways involves deciding where to place them, installing the mapr-gateway package on the nodes that you choose to host the gateways, and specifying to your source MapR cluster where the gateways are located.

Decide where you want to put your gateways.

  • If you are going to replicate MapR-DB tables, you can choose one of the topologies that is listed in the topic Replicating MapR-DB Tables.
  • In you are going to replicate streams, you can choose one of the topologies that is listed in Replicating Streams.
  • If you are going to index MapR-DB tables, see MapR Gateways to find out where you can put gateways that you want to use for indexing.
Note: Gateways perform negligible disk I/O and use negligible amounts of memory, though gateways require significant CPU usage.

However, the resource that gateways use the most is network bytes. For example, if the peak network throughput for puts is about 40 MB per second per node, in a 10-node source cluster the peak network throughput will be about 400 MB per second. So, the aggregate network throughput required on the nodes running gateways will be 400 MB per second for both incoming and outgoing traffic. The aggregate network throughput for a on 50 node cluster would be 2GB per second.

For another example, in this diagram there are two source clusters of three nodes each and the clusters are replicating to one destination cluster. The peak traffic on the gateways will be 40MB per second per cluster node, which means that these gateways together will experience a peak network load of 240MB per second.

Although the load is balanced across the two gateways, so that each gateway experiences a peak network load of 120MB per second, each gateway should be able to tolerate the full aggregate network load in case the other gateway fails unexpectedly.

Configuring gateways involves deciding where to place them, installing the mapr-gateway package on the nodes that you choose to host the gateways, and specifying to your source MapR cluster where the gateways are located.

You can specify their location in one of three ways. To locate the gateways that are running in a destination cluster, MapR-DB follows this process:
  1. Look up the name of the destination cluster and the addresses of the gateways in the information specified by the maprcli cluster gateway set command. If no list of gateways for the destination cluster is found, proceed to step 2.
  2. Perform a DNS lookup of the destination cluster and the addresses of the gateways. If no DNS record for the destination cluster is found, proceed to step 3.
  3. In the mapr-clusters.conf file, look up the name of the destination cluster and the addresses of the CLDB nodes in that cluster, under the assumption that gateways are running on all of the CLDB nodes and only on those nodes.

Although it is possible to use a single gateway to replicate tables or streams to a MapR cluster, or to index tables in an Elasticsearch cluster, the recommended practice is to configure at least two, so that replication and indexing can continue even if one gateway fails. Source MapR clusters will distribute requests among the gateways in a round-robin fashion.

  1. Install the mapr-gateway package on each node that you want to run a gateway on. See Installing MapR Software. When you run the configure.sh script after installing the package, for the -N parameter specify the name of the MapR cluster that the gateway belongs to.
  2. If you want to change the port that a gateway is using, follow these steps: By default, gateways use port 7660.
    1. On the node where the gateway is running, edit the /opt/mapr/conf/gateway.conf file, uncommenting the line #gateway.port=7660 and changing the port number.
    2. After saving the file, restart the gateway by running this command: maprcli node services -name gateway -action restart
  3. On every source MapR cluster, specify the location of the gateways by using one of these methods:
    • Using the maprcli cluster gateway set command
      The syntax of this command is:
      maprcli cluster gateway set -dstcluster <cluster name> -gateways "<space-delimited list of gateways>"

      There is another maprcli command that you can use to generate a list of the gateways in a MapR cluster: maprcli cluster gateway list. You can then copy this list and paste it into the maprcli cluster gateway set command. On the cluster where the gateways are located, run the command maprcli cluster gateway local -format text. If you want to run the command from a different cluster and point to the cluster where the gateways are located, use the -cluster parameter to provide the name of that cluster.

      For an example of running the maprcli cluster gateway set command, suppose that you are configuring table replication from the cluster sanfrancisco to the cluster newyork and want to use two gateways. The nodes on which these gateways are located are named gw1 and gw2.

      The command that you run will look like this:

      maprcli cluster gateway set -dstcluster newyork -gateways "gw1.bigcompany.com gw2.bigcompany.com"
    • Adding a DNS record to your DNS server's zone file for your domain

      In your DNS server’s zone file for your domain, add a record for the cluster where gateways are located, listing the nodes to use as gateways. You can use the MapR Control System (MCS) to create a record that you can copy into a DNS configuration file, run a maprcli command to generate the record, or create a record manually.

      To create a record with the MCS, follow these steps:

      1. Log into MCS on the cluster where the gateways are located.
      2. In the Navigation pane, select MapR-DB Tables.
      3. In the MapR-DB Tables section, click the button Generate Gateway DNS. A window opens with the generated DNS entry.
      4. Copy and paste the record into your zone file.

      To generate a record by using the maprcli, follow these steps:

      1. On the cluster where the gateways are located, run the command maprcli cluster gateway local -format dns. If you want to run the command from a different cluster and point to the cluster that hosts the gateways, use the -cluster parameter to provide the name of the latter cluster.
      2. Copy and paste the output of this command into your zone file.
      If you want to create a record manually, use this format:
      gateway.<clustername> IN TXT “<space-delimited list of hostnames>” 
      You can also specify IP addresses, though using hostnames is recommended so that it is easier to locate gateways if their IP addresses change. Combinations of hostnames and IP addresses are also supported. The default port is 7660. If a gateway is using a different port, append a colon to the address and then specify the port number. Here is an example entry:
      gateway.newyork.bigcompany.com gw1ny.bigcompany.com gw2ny.bigcompany.com
      Multi-homing is also supported. Simply separate the entries for a single node with semicolons, as in this example that uses IP addresses:
      gateway.newyork.bigcompany.com 10.10.34.20 10.10.34.22 10.10.34.24;173.194.79.121

On source clusters:

  • To see a list of the gateways for a particular destination cluster, use the maprcli cluster gateway get command. Specify the name of the destination cluster with the -dstcluster parameter. If you run the command remotely from you source cluster, specify the name of the source cluster with the -cluster parameter.
  • To see a list of the gateways for all of the destination clusters that the source cluster is replicating to, use the maprcli cluster gateway list command. If you run the command remotely from you source cluster, specify the name of the source cluster with the -cluster parameter.
  • To remove the list of gateways that you specified for a destination cluster by using the maprcli cluster gateway set command, use the maprcli cluster gateway delete command. Specify the name of the destination cluster with the -dstcluster parameter. If you run the command remotely from you source cluster, specify the name of the source cluster with the -cluster parameter.
  • To find out whether MapR-DB or MapR Streams is finding gateways from DNS records, lists created by the maprcli cluster gateway set command, or the mapr-clusters.conf file, run the command maprcli cluster gateway resolve . Specify the name of the destination cluster with the -dstcluster parameter. If you run the command remotely from you source cluster, specify the name of the source cluster with the -cluster parameter.

On clusters where gateways are running:

  • If you need to stop and start one or more gateways, you can run these commands:
    maprcli node services -name gateway -action stop -nodes <hostnames or IP addresses>
    maprcli node services -name gateway -action start -nodes <hostnames or IP addresses> 

    Hostnames and IP addresses are separated by spaces.

  • To check the status of a gateway service on a particular node, run the command maprcli service list.

    If a gateway fails, the warden service tries three times to restart it automatically. After an interval, the warden tries again three times to start the gateway. You can configure the interval by using the parameter services.retryinterval.time.sec in the warden.conf file. The default is 30 minutes.

    During the time that the gateway is down, source clusters will resend updates to other gateways. Source clusters will also ping the failed gateway with an exponentially increasing backoff.

    If all of the gateways fail in a destination cluster, source clusters will ping the failed gateways in the same manner. Updates pending replication are stored on disk in an internal data structure until at least one gateway in the destination cluster comes back online. Therefore, you will see additional storage costs during a gateway outage. The Gateway Service Down alarm in MCS will notify you when none of the gateways in a destination cluster can be reached.

    If the additional storage becomes too costly, you can follow either of these procedures:

    If you are replicating to a MapR-DB binary table:

    1. Run the maprcli table replica remove command to stop replicating to the replica. This action deletes the pending updates.
    2. Resolve the gateway outage.
    3. Recreate the replica and start replicating to it by running the maprcli table replica autosetup command.

    If you are replicating to a MapR Streams stream:

    1. Run the maprcli stream replica remove command to stop replicating to the replica stream. This action cancels the pending updates to the replica stream.
    2. Resolve the gateway outage.
    3. Run the command maprcli stream replica autosetup to recreate the replica stream and start replicating to it.

    If you are replicating to an Elasticsearch type:

    1. Run the maprcli table replica elasticsearch remove command to stop replicating to the type. This action deletes the pending updates.
    2. Resolve the gateway outage.
    3. Delete the type from the Elasticsearch cluster.
    4. Recreate and load the Elasticsearch type with the maprcli table replica elasticsearch autosetup command. If you originally created the type manually because you set up custom mapping of MapR-DB data to data types other than string, first recreate the type with the manual method that you used. Then, run the maprcli table replica elasticsearch autosetup command.

You can refer to two log files for each gateway when troubleshooting. Both are in the /opt/mapr/logs directory on the node where the gateway is running:

  • gateway.log
  • gatewayinit.log