Connectors, Tasks, and Workers

This section describes how Kafka Connect for MapR Streams work and how connectors, tasks, offsets, and workers are associated wth each other.


Connectors (or a connector instance) are logical jobs that are responsible for managing the copying of data between MapR Streams and another systems. Each connector instantiates a set of tasks that copies the data. By allowing the connector to break a single job into many tasks, support is built-in for parallelism and scalable data copying with very little configuration. Connector plugins are jars that add the classes that implement a connector.


As connectors run, Kafka Connect tracks offsetsfor each one so that connectors can resume from their previous position in the event of failures or graceful restarts for maintenance. They track the current position in the stream of data being copied and because each connector may need to track many offsets for different partitionsof the stream. For example, when loading data from a database, the offset might be a transaction ID that identifies a position in the database change log.

Users generally do not need to worry about the format of offsets, especially since they differ from connector to connector. However, Kafka Connect does require persistent storage for offset data to ensure it can recover from faults. This storage for offset data is configurable. See Standalone Worker Configuration Options and Distributed Worker Configuration Options


Connectors and tasks are logical units of work and must be scheduled to execute in a process. Kafka Connect calls these processes workers. With Kafka Connect for MapR streams, the worker processors run as a service. This service can be run in either standalone mode or distributed mode.
  • In standalone mode, the cluster consists of a single worker that is supplied with tasks that are useful for testing and debugging purposes.
  • In distributed mode, the cluster consisting from multiple workers with the same,, and Connector tasks are submitted via the Kafka Connect REST API.

The following list the location of the standalone and distributed worker configuration files:

Note: Non-distributed mode is not available in version 2.0.1-1611 of Kafka Connect for MapR Streams.
Note: Port 8083 is the default port.
Note: If you running multiple workers on the same node, the rest.port parameter must be different for each worker.