Converting Data to Elasticsearch Data Types that MapR-DB Supports
If you want to convert source data (which is stored as byte arrays) to Elasticsearch
types that MapR-DB supports, you can create each destination index explicitly with
Elasticsearch’s create index
API and then define the mapping of data types
with Elasticsearch’s put mappin
g API. MapR gateways perform the data
conversion.
About this task
Here is a list of the data types that gateways can convert your source data into by using this method:
- Binary
- A base64 representation of binary data that can be stored in an index.
- Core Elasticsearch data types
- boolean
- byte
- double
- float
- integer
- long
- short
- string
- date
- geolocation
- IP addresses
java.nio.ByteBuffer
to convert source data to
boolean, byte, double, float, integer, long, short, and date data types. IP
addresses and geolocations are passed as strings. Restrictions
- boolean
- Boolean values must be represented by single bytes.
- date
- Timestamps must be long integers representing the time in milliseconds since the epoch.
- geolocation
- Geolocations must be pairs of latitude and longitude coordinates or geohash data types encoded as UTF-8 strings.
- IP address
- IP addresses must be UTF-8 encoded strings.
If your data cannot meet these requirements, then you must write Java routines to tell MapR-DB how to perform custom conversions.
Procedure
-
Create the index in Elasticsearch by calling Elasticsearch’s
create index
API. See Index API in the Elasticsearch documentation. -
Call Elasticsearch’s
put mapping
API to register specific data-type mapping definitions for the type. When MapR-DB first puts data into the index, it calls Elasticsearch’sget mapping
API to retrieve the mapping definitions. See Put Mapping in the Elasticsearch documentation.
What to do next
If you have not done so already, register your Elasticsearch cluster or clusters with your MapR source cluster.
If you have already registered your Elasticsearch cluster, configure replication to types in Elasticsearch.
- Pause indexing of your MapR-DB source tables. To get a list of the
Elasticsearch types that are used for each source table, use the
maprcli table replica elasticsearch list
command. For each Elasticsearch type, issue themaprcli table replica elasticsearch pause
command to pause indexing. - Restart the MapR gateways that you are using for indexing. See the section "On clusters where gateways are running" in Configuring a MapR Gateway Master-Slave Topology.
- Resume indexing by issuing the command
maprcli table replica elasticsearch resume
for each Elasticsearch type that you are indexing your data in.