Gateways for Indexing MapR-DB Data in Elasticsearch

When you index MapR-DB tables in Elasticsearch, MapR-DB replicates table updates to corresponding Elasticsearch types. The MapR-DB tables are in MapR clusters, and the types are in Elasticsearch clusters. Gateways receive table updates and pass them to nodes in Elasticsearch clusters.

You can place gateways on existing nodes in your source MapR cluster, on existing nodes in your Elasticsearch cluster, or on nodes that are not part of either cluster -- wherever you find that the network performance from source MapR cluster to gateway and from gateway to Elasticsearch cluster is best.

Note: Only MapR-DB binary tables can be replicated to Elasticsearch.
Note: When gateways are on nodes that are part of an Elasticsearch cluster, the gateways are invisible to Elasticsearch. All management of gateways is done from the source MapR cluster. If you use the maprcli cluster gateway set command, then the -dstcluster parameter is set to the MapR source cluster. MapR-DB would then understand that indexing to the Elasticsearch cluster goes through that gateway.

Wherever you place MapR gateways that you use for indexing, they become part of the source MapR-DB cluster for the following two reasons:

  • After you install the mapr-gateway package on a node, you run the script, supplying the name of the MapR cluster that the gateway belongs to as the value for the -N parameter.
  • When you tell your source MapR cluster where the gateways are, you list the gateway nodes together with the name of the MapR cluster that they belong to.

As a result, you can manage gateways from the maprcli when you are logged into the source MapR cluster. Also, gateways are able to access information on the source MapR cluster about source tables and their corresponding Elasticsearch types.