Connectors, Tasks, and Workers
This section describes how Kafka Connect for MapR Streams work and how connectors, tasks, offsets, and workers are associated wth each other.
Connectors (or a connector instance) are logical jobs that are responsible for managing the copying of data between MapR Streams and another systems. Each connector instantiates a set of tasks that copies the data. By allowing the connector to break a single job into many tasks, support is built-in for parallelism and scalable data copying with very little configuration. Connector plugins are jars that add the classes that implement a connector.
As connectors run, Kafka Connect tracks offsetsfor each one so that connectors can resume from their previous position in the event of failures or graceful restarts for maintenance. They track the current position in the stream of data being copied and because each connector may need to track many offsets for different partitionsof the stream. For example, when loading data from a database, the offset might be a transaction ID that identifies a position in the database change log.
Users generally do not need to worry about the format of offsets, especially since they differ from connector to connector. However, Kafka Connect does require persistent storage for offset data to ensure it can recover from faults. This storage for offset data is configurable. See Standalone Worker Configuration Options and Distributed Worker Configuration Options
- In standalone mode, the cluster consists of a single worker that is supplied with tasks that are useful for testing and debugging purposes.
- In distributed mode, the cluster consisting from multiple workers with the same group.id, offset.storage.topic, and config.storage.topic. Connector tasks are submitted via the Kafka Connect REST API.
The following list the location of the standalone and distributed worker configuration files: