MapR 5.1 is at End of Life (EOL) and no longer supported. Please see the latest documentation. This documentation is not being updated.

Home
5.1 Ecosystem
This guide contains a section for each open source project that MapR supports. You can learn how to install, configure, use, and integrate each project within the context of a MapR cluster.
Spark
Integrating Spark
Integrate Spark with R
Integrate Spark with R when you want to run R programs as Spark jobs.

MapR 5.1 Documentation

5.1 Ecosystem
This guide contains a section for each open source project that MapR supports. You can learn how to install, configure, use, and integrate each project within the context of a MapR cluster.
- AsyncHBase
- Cascading
- Drill
- Flume
- HBase
- HCatalog
- Hive
- HttpFS
- Hue
- Impala
- Mahout
- Myriad
- Oozie
- Pig
- OpenStack Sahara (Mitaka)
- Sentry
- Spark
  - Spark Feature Support
    MapR supports most Spark features. However, there a few exceptions.
  - Spark Standalone
  - Spark on YARN
  - Integrating Spark
    - Integrate Spark-SQL with Avro
      Integrate Spark-SQL with Avro when you want to read and write Avro data.
    - Integrate Spark with HBase
      Integrate Spark with HBase or MapR-DB when you want to run Spark jobs on HBase or MapR-DB tables.
    - Integrate Spark-SQL with Hive
      Integrate Spark-SQL with Hive when you want to run Spark-SQL queries on Hive tables.
    - Integrate Spark with MapR Streams
      Integrate Spark with MapR Steams to enable Spark to query MapR Streams for new messages at a given interval, process any new messages that are available, and also publish messages into MapR Streams.
    - Integrate Spark with R
      Integrate Spark with R when you want to run R programs as Spark jobs.
  - Spark 2.0.0 Developer Preview
    Apache Spark is an open-source processing engine that you can use to process Hadoop data. Although MapR does not yet ship a Spark 2.0.0 package, you can install and use Spark 2.0.0 on a non-secure MapR 5.1 cluster or on a secure MapR 5.1 cluster that uses MapR-SASL authentication.
- Sqoop
- Storm
- Third Party Solutions

Integrate Spark with R

Integrate Spark with R when you want to run R programs as Spark jobs.

About this task

As of Spark 1.5.2, you can integrate Spark with R.

Procedure

Install R 3.2.2 or greater on each node that will submit Spark jobs:
- On Ubuntu:
```
apt-get install r-base-dev
```
- On CentOS/RedHat:
```
yum install R
```
For more information on installing R, see the R documentation.
To verify the integration, run the following commands as the mapr user or as a user that mapr impersonates:
1. Start SparkR:
```
/opt/mapr/spark/spark-1.5.2/bin/sparkR --master <master-url>
```
2. Run the following command to create a DataFrame using sample data:
```
people <- read.df(sqlContext, "file:///opt/mapr/spark/spark-1.5.2/examples/src/main/resources/people.json", "json")
```
3. Run the following command to display the data from the DataFrame that you just created:
```
head(people)
```