Planning and Initial Deployment
There are a number of considerations to take into account before migrating from Apache Hadoop to data-fabric Hadoop.
The first phase of migration is planning. In this phase you will identify the requirements and goals of the migration, identify potential issues in the migration, and define a strategy.
The requirements and goals of the migration depend on a number of factors:
- Data migration: can you move your datasets individually, or must the data be moved all at once?
- Downtime: can you tolerate downtime, or is it important to complete the migration with no interruption in service?
- Customization: what custom patches or applications are running on the cluster?
- Storage: is there enough space to store the data during the migration?
The data-fabric Hadoop distribution is 100% plug-and-play compatible with Apache Hadoop, so you do not need to make changes to your applications to run them on a data-fabric cluster. Data Fabric Hadoop automatically configures compression and memory settings, task heap sizes, and local volumes for shuffle data.
Initial Deployment
The initial data-fabric deployment phase consists of installing, configuring, and testing the data-fabric cluster and any ecosystem components (such as Hive or Pig) on an initial set of nodes. Once you have the data-fabric cluster deployed, you will be able to begin migrating data and applications.
To deploy the data-fabric cluster on the selected nodes, see the Installing Core and Ecosystem Components