Select Services

This section describes some of the services that can be run on a node.

Every installation requires services to manage jobs and applications. ResourceManager and NodeManager manage MapReduce version 2 and other applications that can run on YARN. In addition, HPE Ezmeral Data Fabric requires the ZooKeeper service to coordinate the cluster, and at least one node must run the CLDB service. The WebServer service is required if you want to use the browser-based Control System.

After you install HPE Ezmeral Data Fabric core, you can install ecosystem components that belong to an Ecosystem Pack (EEP). An EEP provides a set of ecosystem components that work together. When a newer version or a revision to a component becomes available, the EEP version is updated to reflect the fact that an update was made. For more information about the ecosystem components available in each EEP and a list of EEPs supported by your HPE Ezmeral Data Fabric cluster version, see Ecosystem Packs (EEPs).

The following table shows some of the services that can be run on a node:

Service Category Service

Description

Management Warden Warden runs on every node, coordinating the node's contribution to the cluster. Warden is also responsible for managing the service state and its resource allocations on that node.
YARN NodeManager Hadoop YARN NodeManager service. The NodeManager manages node resources and monitors the health of the node. It works with the ResourceManager to manage YARN containers that run on the node.
HPE Ezmeral Data Fabric Core FileServer FileServer is the HPE Ezmeral Data Fabric service that manages disk storage for file system and HPE Ezmeral Data Fabric Database on each node.
HPE Ezmeral Data Fabric Core CLDB Maintains the container location database (CLDB) (CLDB) service. The CLDB service coordinates data storage services among file system file server nodes, and access across HPE Ezmeral Data Fabric NFS gateways, and HPE Ezmeral Data Fabric clients.
HPE Ezmeral Data Fabric Core NFS Provides read-write HPE Ezmeral Data Fabric Direct Access NFS™ access to the cluster, with full support for concurrent read and write access.
Storage MapR HBase Client Provides access to HPE Ezmeral Data Fabric Database binary tables via HBase APIs. Required on all nodes that will access table data in file system, typically all edge nodes for accessing table data. HBase API can also be accessed through the HBase Thrift and Rest Gateways.
YARN ResourceManager Hadoop YARN ResourceManager service. The ResourceManager manages cluster resources, and tracks resource usage and node health.
Management ZooKeeper Internal service. Enables high availability (HA) and fault tolerance for HPE Ezmeral Data Fabric clusters by providing coordination.
YARN HistoryServer Archives MapReduce application metrics and metadata.
Management Web Server Contains static Control System user interface pages.
Management Apiserver Allows you to perform cluster administration programmatically, and supports the Control System (see Setting Up the Control System).
OJAI Distributed Query Service Drill Provides the distributed query service powered by Apache Drill for HPE Ezmeral Data Fabric Database JSON. Supports the following functionality:
  • Advanced secondary index selection
  • Sorts on large data sets
  • Parallel query execution
See OJAI Distributed Query Service for more details about the service.
Application Hue Hue is the Hadoop User Interface that interacts with Apache Hadoop and its ecosystem components, such as Hive, Pig, and Oozie. It also provides interactive notebook access to Spark through Livy.
Application Hive Hive is a data warehouse engine that supports SQL-like adhoc querying and data summarization.
Application HCatalog HCatalog provides applications with a table view of the file system layer of the cluster, expanding your options from read/write data streams to add-[Hive]-table operations such as get row and store row.
Application Cascading Cascading is an application framework for analyzing and managing big data.
Application Spark Spark is a processing engine for large datasets. While it can be deployed locally or standalone, the recommended deployment is on YARN. The application timeline server component provides a historical view of query details.
Application Airflow Apache Airflow is a tool that helps you to author, schedule, or monitor workflows or data pipelines.
Application Ranger Apache Ranger is a framework to enable, monitor and manage data security across the Hadoop platform in the HPE Ezmeral Data Fabric. Ranger provides centralized security administration and fine-grain access control for user access within Apache Hadoop, Apache Hive, Apache HBase, and other Apache components.
Application NiFi Apache NiFi is a dataflow system based on the concepts of flow-based programming. Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. NiFi has a web-based user interface for the design, control, feedback, and monitoring of dataflows.
Application OTel OTel is an observability framework that allows you to instrument, generate, collect, and export telemetry data.
Application Zeppelin Apache Zeppelin is an open source, Web-based data-science notebook. You can use it with Data Fabric components to conduct data discovery, ETL, machine learning, and data visualization.