Planning and Initial Deployment

There are a number of considerations to take into account before migrating from Apache Hadoop to data-fabric Hadoop.

The first phase of migration is planning. In this phase you will identify the requirements and goals of the migration, identify potential issues in the migration, and define a strategy.

The requirements and goals of the migration depend on a number of factors:

  • Data migration: can you move your datasets individually, or must the data be moved all at once?
  • Downtime: can you tolerate downtime, or is it important to complete the migration with no interruption in service?
  • Customization: what custom patches or applications are running on the cluster?
  • Storage: is there enough space to store the data during the migration?

The data-fabric Hadoop distribution is 100% plug-and-play compatible with Apache Hadoop, so you do not need to make changes to your applications to run them on a data-fabric cluster. Data Fabric Hadoop automatically configures compression and memory settings, task heap sizes, and local volumes for shuffle data.

Initial Deployment

The initial data-fabric deployment phase consists of installing, configuring, and testing the data-fabric cluster and any ecosystem components (such as Hive or Pig) on an initial set of nodes. Once you have the data-fabric cluster deployed, you will be able to begin migrating data and applications.

To deploy the data-fabric cluster on the selected nodes, see the Installing Core and Ecosystem Components