hdfs_write_revised.c

This application demonstrates how to write to files by using the APIs hdfsWrite() and hdfsPwrite(): hdfs_write_revised.c

Before running this application:

  • Ensure that you have access either to a cluster running MapR-FS.
  • Ensure that a text-based file that you have access to exists on the cluster. Note the path to the file and the size of the file in bytes.
  • The content of the file will be deleted before the first write is performed by the application.
  • Decide on the length in bytes of a string to write to the file.

To build and run it, download it from this page to a MapR client or to a system with the mapr-core package installed. Then, modify the run.sh script in Building and Running C Applications on MapR-FS Clients to point to this sample application. Run the script and then run the application.

The application includes these header files:

  • stdio.h
  • hdfs.h
  • errno.h
  • fcntl.h

The APIs are defined in hdfs.h. The file fcntl.h defines the file-access flags.

The application performs the actions that are described in the following sections.

Takes a filename, file size, and buffer size as input

When you launch the application, provide the path and name of the file, the size of the file, and the number of bytes to write.

hdfs_write <filename> <filesize> <buffersize>

Sets an RPC timeout

hdfsSetRpcTimeout() is specific to the libMapRClient version of libhdfs and takes a value that is specified in seconds. The default is 99 seconds. If you change this value, set it either to 0 (which eliminates timeouts) or to a value greater than 30.

  int err = hdfsSetRpcTimeout(30);
  if (err) {
    fprintf(stderr, "Failed to set rpc timeout!\n");
    exit(-1);
  }

Connects to a filesystem, using an API that is supported in the hadoop-0.20.2 version of libhdfs

The application tries to connect to the first MapR-FS cluster that is specified in the mapr-clusters.conf file in the MAPR_HOME/conf directory on the client. After connecting to the filesystem, the application returns a handle to the filesystem.

  hdfsFS fs = hdfsConnect("default", 0);
  if (!fs) {
    fprintf(stderr, "Oops! Failed to connect to hdfs!\n");
    exit(-1);
  }

Stores the values of the arguments

The application stores the values of the arguments in a character array and in two variables of type tSize. This datatype is defined in hdfs.h and is a fixed-width, signed 32-byte integer type for storing the size of data for read or write operations.

  const char* rfile = argv[1];
  tSize fileSize = strtoul(argv[2], NULL, 10);
  tSize bufferSize = strtoul(argv[3], NULL, 10);

Opens the file that you specified

The application opens the specified file, passing the following values to the hdfsOpenFile() function:

  • The handle to the filesystem
  • The name of the file, which you supplied when you launched the application.
  • A flag to indicate the mode in which to open the file. In this case, the flag is O_WRONLY. This flag creates the file if the file does not exist and truncates the file if the file does exist. If the file existed and you wanted to preserve the content of the file, you would specify O_WRONLY | O_APPEND for flag. These flags are defined in the header file fcntl.h.
  • The default chunk size for the directory in which the file is either located or will be created. This value is specified by the 0 in the last parameter.

Although there are two other parameters in the hdfsOpenFile() function – the fourth and fifth, the libMapRClient version of libhdfs ignores them.

  hdfsFile writeFile = hdfsOpenFile(fs, rfile, O_WRONLY, 0, 0, 0);
  if (!writeFile) {
    fprintf(stderr, "Failed to open %s for writing!\n", rfile);
    exit(-2);
  }

Creates a buffer of the size that you specified and populates the buffer

At this point that the application, creates a string to populate the buffer. This is the data that the application will write.

  char* buffer = malloc(sizeof(char) * bufferSize);
  if(buffer == NULL) {
    fprintf(stderr, "Failed to allocate memory!\n");
    return -2;
  }
  int i;
  for (i=0; i<bufferSize; i++) {
    buffer[i] = 'a' + i%26;
  }

Writes an entire file with hdfsWrite()

The application calls the function writeLength():

  int ret = writeLength(fs, writeFile, buffer, bufferSize, fileSize);
  if (ret < 0) {
    goto done;
  }

This function writes the content of the buffer to the file, starting at offset 0.

int
writeLength(hdfsFS fs, hdfsFile writeFile, char *buffer, tSize bufferSize, tSize writeSize)
{
  tSize writeBytes = 0;
  tSize ret = 0;
  uint64_t totalWrite = 0;
  if (fs == NULL || writeFile == NULL || buffer == NULL) {
    return -1;
  }
  if (writeSize == 0) {
    return 0;
  }
  for (writeBytes=0; writeBytes<writeSize; writeBytes+=bufferSize) {
    ret = hdfsWrite(fs, writeFile, (void*)buffer, bufferSize);
    if (ret > 0) {
      totalWrite += ret; 
    } else {   
      fprintf(stderr, "hdfsWrite failed with error %d \n", errno);
      hdfsCloseFile(fs, writeFile);
      return -1;
    }
  }
  return 0; 
}

Seeks an offset and writes from that offset with hdfsWrite()

The application next calls the function writeAtOffse():

  tSize writeBytes = writeAtOffset(fs, writeFile, 0, buffer, bufferSize);
  if (writeBytes < 0) {
    goto done;
  }

This function writes the content of the buffer to the file, starting at the specified offset. If the file already exists, the file is first truncated to this offset before the write operation begins. In this case, the specified offset is 0.

The difference between this function and the previous function is that, before writing, it calls hdfsSeek() to move to the specified offset in the file.

tSize
writeAtOffset(hdfsFS fs, hdfsFile writeFile, tOffset offset, 
               char *buffer, tSize bufferSize)
{
  tSize ret = 0;
  if (fs == NULL || writeFile == NULL || buffer == NULL) {
    return -1;
  }
  ret = hdfsSeek(fs, writeFile, offset);
  if (!ret) {
    //hdfsWrite will return -1 if ret != number of bytes asked to 
    //be written.
    ret = hdfsWrite(fs, writeFile, buffer, bufferSize);
    if (ret < 0) {
      fprintf(stderr, "hdfsWrite failed with error %d \n", errno);
    }
  } else {
    fprintf(stderr, "hdfsSeek failed with error %d \n", errno);
  }
  if (ret < 0) {
    //hdfsWrite does a flush in case of an error, explicit flush
    //is not required.
    hdfsCloseFile(fs, writeFile);    
  }
  //Current offset within the file will be positioned at (offset + writeBytes)th byte.
  return ret;
}

Performs a positional write with hdfsPwrite()

The application next calls the function positionalWrite():

  writeBytes = positionalWrite(fs, writeFile, 20, buffer, bufferSize);
  if (writeBytes < 0) {
    goto done;
  }

This function writes the content of the buffer to the file, starting at the offset that you specify.

tSize
positionalWrite(hdfsFS fs, hdfsFile writeFile, tOffset offset, 
               char *buffer, tSize bufferSize)
{
  tSize writeBytes = 0;
  if (fs == NULL || writeFile == NULL || buffer == NULL) {
    return -1;
  }
  writeBytes = hdfsPwrite(fs, writeFile, offset, buffer, bufferSize);
  if (writeBytes < 0) { 
    fprintf(stderr, "hdfsPwrite failed with error %d \n", errno);
    hdfsCloseFile(fs, writeFile);
  }
  //Current offset within the file will not be advanced if hdfsPwrite is used
  return writeBytes;
}

Closes the file

hdfsCloseFile(fs, writeFile);

Frees the buffer

free(buffer);

Disconnects from the filesystem

hdfsDisconnect(fs);