Registering Elasticsearch Clusters with MapR Clusters

To register an Elasticsearch cluster with a MapR cluster, you run a script that copies the Elasticsearch cluster’s configuration file (elasticsearch.yml), JAR files, and plugin JAR files into MapR-FS on the MapR cluster where you run the script from. When you run the script, you provide the IP address or hostname of the Elasticsearch node to copy the files from.

Prerequisites

  • Ensure that the Elasticsearch cluster is at version 1.4.
  • Ensure that there are adequate resources in the Elasticsearch cluster to handle updates being sent from MapR-DB at high speeds and high volume. MapR-DB tables can ingest large numbers of puts very rapidly and send them to gateways also very rapidly.
  • Ensure that there is enough storage space in the Elasticsearch cluster. Because the binary data in MapR-DB is converted to less compact JSON documents, storage requirements in the Elasticsearch cluster will exceed the storage requirements in the MapR-DB cluster for the same number of columns.
  • Plan which nodes will serve as MapR gateways for communication between MapR-DB and Elasticsearch, install the mapr-gateway package on those nodes, and notify the MapR cluster of the location of those nodes. See MapR Gateways.
  • Find out whether Elasticsearch was installed on the Elasticsearch cluster by means of a ZIP or TAR file, or by means of a Debian or RPM package. The installation method determines where the registration script looks for the files that it needs to copy.
  • Ensure that your user ID has read permission on the Elasticsearch installation directory. If you will be specifying a different user ID, ensure that ID has read permission on this directory.
  • Decide whether to use the actual name of the Elasticsearch cluster during the registration process or a different name. The name under which you register the cluster does not have to match the actual name. You cannot change this name later.

About this task

MapR-DB binary tables in MapR clusters can be indexed across multiple Elasticsearch clusters. For example, you could index one set of columns in one Elasticsearch cluster and another set of columns in another Elasticsearch cluster. supporting different applications by doing so.

Procedure

To register an Elasticsearch cluster with the MapR cluster on which your source binary tables are located, run the script /opt/mapr/bin/register-elasticsearch.
Supply the following parameters and values:
-c
Specify the name of the Elasticsearch cluster. This name is used only for registering the cluster with the MapR cluster and does not have to match the actual name.
-r
Provide either of the following values:
  • If your MapR gateways are using node clients to communicate with the Elasticsearch cluster, specify the hostname or IP address of an Elasticsearch node from which to copy the elasticsearch.yml file, Elasticsearch JAR files, and plugin JAR files.
    IMPORTANT: Before you run this script, ensure both that multicast is disabled in the elasticsearch.yml file and that the hostname or IP address of each MapR gateway node is included in the unicast node list in the elasticsearch.yml file.
  • If your MapR gateways are using transport clients to communicate with the Elasticsearch cluster, specify the hostname or IP address of an Elasticsearch transport node from which to copy the elasticsearch.yml file, Elasticsearch JAR files, and plugin JAR files, followed by the hostnames or IP addresses of any Elasticsearch nodes to use as additional transport nodes.
-t
If you specified the second value listed for the -r option, include this parameter to notify the cluster to use the nodes listed in -r as transport nodes.
-u
Optional: Specify an alternative user ID for connecting to the Elasticsearch cluster by means of scp. The default is the current user ID.
-e
If Elasticsearch was installed by means of a ZIP or TAR file, specify the path to Elasticsearch’s installation directory on the Elasticsearch cluster. If Elasticsearch was installed with a Debian or RPM package, omit this parameter.

If you plan to run the script via your own script, include the -y parameter, which omits interactive prompts.

To see the help for the script, run the script with the -h parameter only.

What to do next

Set up replication from one or more MapR-DB source binary tables to Elasticsearch types. See Configuring Replication to Elasticsearch Types.

If you want to list the Elasticsearch clusters that are registered with the current MapR cluster, run the script with the -l parameter only.

If you change the elasticsearch.yml file for the cluster, the Elasticsearch JAR files for the cluster, or both, you must re-register the Elasticsearch cluster with the MapR source cluster and restart the MapR gateways that you are using for indexing. Follow these steps:

  1. Pause indexing of your MapR-DB source binary tables. To get a list of the Elasticsearch types that are used for each source table, use the maprcli table replica elasticsearch list command. For each Elasticsearch type, issue the maprcli table replica elasticsearch pause command to pause indexing.
  2. Re-register the Elasticsearch cluster by running the script /opt/mapr/bin/register-elasticsearch. Use the same parameters as you did when you first registered the Elasticsearch cluster. However, this time include the -f parameter to force the registration. This parameter is necessary because you are not unregistering the cluster before registering it again.
  3. Restart the MapR gateways that you are using for indexing. See the section "On clusters where gateways are running" in Configuring a MapR Gateway Master-Slave Topology.
  4. Resume indexing by issuing the command maprcli table replica elasticsearch resume for each Elasticsearch type that you are indexing your data in.
If you want to delete the registration for an Elasticsearch cluster, run the script with these parameters:
  • -c: Specify the name that was used for the Elasticsearch cluster when it was registered.
  • -d: Specifies to delete the registration.