hadoop mfs

The hadoop mfs command displays directory information and contents, creates symbolic links and hard links, sets, gets, and removes Access Control Expressions (ACE) on files and directories, and sets compression and chunk size on a directory.

Syntax

hadoop mfs
  [ -count <path> ]
  [ -delace [-R] <path> ]
  [ -getace [-R] <path> ]
  [ -help <command> ]
  [ -ln <target> <symlink> ]
  [ -lnh <target> <hardlink> ]
  [ -ls <path> ]
  [ -lsd <path> ]
  [ -lsf <path> ]
  [ -lso <path> ]
  [ -lsor <path> ]
  [ -lsr <path> ] 
  [ -Lsr <path> ] 
  [ -lsrv <path> ]
  [ -lss <path> ]
  [ -offload <file_path> [-v] ]
  [ -recall <file_path> [-v] ]
  [ -rmr <path> ]    
  [ -setace [-R] 
     [-readfile <ace>] [-writefile <ace>] [-executefile <ace>] 
     [-addchild <ace>] [-deletechild <ace>] [-lookupdir <ace>] [-readdir <ace>] 
     [-aces "[rf:<ace>],[wf:<ace>],[ef:<ace>],[ac:<ace>],[dc:<ace>],[rd:<ace>],[ld:<ace>]"] 
     [-preservemodebits <true|false>] [-setinherit <true|false>] <path> ]
  [ -setaudit  on|off <dir|file|table> ]
  [ -setcompression  on|off|lzf|lz4|zlib <dir|table> ]
  [ -setchunksize <size> <dir> ]
   
  [ -setnetworkencryption on|off <target> ]
  [ -stat <path> ]
  [ -tierstatus <file_path> [-v] ]
  [ -addsecuritypolicytag [-R] <comma-separated list of security policy tags> <path> ]
  [ -getsecuritypolicytag [-R] <path> ]
  [ -removesecuritypolicytag [-R] <comma-separated list of security policy tags> <path> ]         
  [ -removeallsecuritypolicytag [-R] <path> ] 
  [ -setsecuritypolicytag [-R] <comma-separated list of security policy tags> <path> ]          
         

Parameters

The normal command syntax is to specify a single option from the following table, along with its corresponding arguments. If you do not set compression and chunk size for a given directory, the values are inherited from the parent directory.

Parameter

Description

-count <path> Counts and returns the number of directories and (regular, symbolic link, volume link, kvstores, and device) files in the specified path (recursively).
-delace [-R] <path> Deletes all ACEs associated with the specified file or directory and sets Access Control Expression (ACE)s for the specified file or directory to the default value, which is the empty string. Here:
  • [-R] — Enables recursion allowing you to perform the operation in subdirectories as well.
  • <path> — Specifies the path to the file or directory.

You cannot delete specific access types with this parameter. Instead, if necessary, reset the value for the specific access type to an empty string using the -setace parameter. If you use an empty string to deny a specific type of access, then that type of access is denied to all users. To deny specific types of access to specific users only, use the negation operator (!). The mode bits corresponding to the ACEs being deleted, do not change.

-getace [-R] <path> Returns the permissions -- POSIX mode bits and ACEs -- for the given file or (recursively) for the directory. Recursion is enabled only if -R is specified; if -R is not specified, this parameter returns the permissions only for the given directory. Here:
  • [-R] — (Optional) Enables recursion allowing you to perform the operation in subdirectories as well.
  • <path> — (Required) Specifies the path to the file or directory.

If one or more ACEs are available for the file or directory, a plus sign (+), which indicates that both ACEs and POSIX mode bits are set for the given file or directory, is returned. If the ACE on the file or directory is an empty string, the plus sign is not returned.

-help <command> Displays help for the hadoop mfs command.
-ln <target> <symlink> Creates a symbolic link <symlink> that points to the target path <target>, similar to the standard Linux ln -s command.
-lnh <target> <hardlink> Creates a hardlink that associates a new name or a file path with an existing file. You must specify the following:
<target>
File name, including the full path, of the file to link to.
<hardlink>
New name, including the full path, of the file to link with.
-ls <path> Lists files in the directory specified by <path>. The hadoop mfs -ls command corresponds to the standard hadoop fs -ls command, but provides the following additional information:
  • Chunks used for each file
  • Server where each chunk resides
  • Whether compression is enabled for each file
  • Whether encryption is enabled for each file
  • Whether audit is enabled (A) or disabled (U) for each file
-lsd <path> Lists files in the directory specified by <path>, and also provides information about the specified directory itself:
  • Whether compression is enabled for the directory (indicated by z )
  • The configured chunk size (in bytes) for the directory.
-lsf <path> Lists just the file ID (fid) and the file name, for each file present in the specified path. The output is not sorted.

Use this option when there are millions of files in a directory. In this scenario. using this option results in the fastest listing of files, as only the fids and the file names are returned. All other file attributes are ignored.

-lso <path> Lists files in the directory specified by <path>. The hadoop mfs -lso command corresponds to the standard hadoop fs -ls command, but provides the following additional information:
  • Whether compression is enabled for each file
  • Whether encryption is enabled for each file
  • Whether audit is enabled (A) or disabled (U) for each file
This command is faster than hadoop fs -ls as it uses an optimized printing method to dump data on screen.
-lsor <path> Recursively lists files in the directory specified by <path>. This command is the recursive variant of the hadoop mfs -lso command.
-lsr <path> Recursively lists files in the directory and subdirectories specified by <path>. The hadoop mfs -lsr command corresponds to the standard hadoop fs -lsr command, but provides the following additional information:
  • Chunks used for each file
  • Server where each chunk resides
-Lsr <path> Equivalent to lsr, but additionally dereferences symbolic links
-lsrv <path> Lists all paths recursively without crossing volume links.
-lss <path> Lists files in the directory specified by <path>, with an additional column that displays the number of disk blocks per file. Disk blocks are 8192 bytes.
-offload <file_path> [-v] The file to offload to the storage tier. This is a blocking operation; the control is not returned until the operation is complete and the file has been offloaded. Use -v (for verbose) to view the status of the ongoing offload operation.
-recall <file_path> [-v] The file to recall from the storage tier. This is a blocking operation; the control is not returned until the operation is complete and the file has been recalled. Use -v (for verbose) to view the status of the ongoing recall operation.
-rmr <path> Recursively deletes files and directories in the specified path. This is a highly optimized version of the normal generic hadoop fs rmr command and is 10X faster for large directories. This option is useful when one or more directories in the specified path contains many (millions of) files.
-setace [-R] [-readfile <ace>] [-writefile <ace>] [-executefile <ace>] [-addchild <ace>] [-deletechild <ace>] [-lookupdir <ace>] [-readdir <ace>] [-aces "[rf:<ace>],[wf:<ace>],[ef:<ace>],[ac:<ace>],[dc:<ace>],[rd:<ace>],[ld:<ace>]"] [-preservemodebits <true|false>] [-setinherit <true|false>] <path> Sets or modifies the read, write, and execute permissions for files or directories. This argument will:
  • Overwrite existing values with new values, if specified, for access types that were previously set.
  • Set new values for access types that have not yet been set.
  • Not set or modify access types that were not passed in with the command, whether they were previously set or unset.

Specify the ACEs immediately after the -setace parameter. Specify all the other parameters, after the ACE.

Here:
-R
Enables recursion allowing you to perform the operation in subdirectories as well. Recursion is enabled only if -R is specified; if -R is not specified, sets or modifies the permissions for the given directory only.
<ace>
ACE Syntax defined using boolean expressions and sub-expressions.
-readfile
-writefile
-executefile
Specifies permissions (defined using ACEs) for reading, writing, or executing the file.
-readdir
-lookupdir
Specifies permissions (defined using ACEs) for accessing the directory. To permit users to read files in the directory, grant the lookupdir access permission. To permit users to write to or execute files in the directory, grant the readdir and lookupdir access permissions.
-addchild
-deletechild
Specifies permissions (defined using ACEs) for adding or deleting subdirectories.
-aces
Specifies ACEs as a single string. Specify a comma-separated list of ACEs within quotes, up to 60 KB in length. You can set the following permissions.
rf Refers to read file access.
wf Refers to write file access
ef Refers to execute file access
rd Refers to read directory access.

To write to or execute files in the directory, grant the user rd and ld permissions.

ld Refers to lookup directory access.

To read the files in the directory, grant the user ld access. To write to or execute files in the directory, grant the user both rd and ld permissions.

ac Refers to add child directory access.
dc Refers to delete child directory access.
-preservemodebits
Indicates whether to preserve or modify POSIX mode bits. Values can be:
  • true - preserve existing POSIX mode bits
  • false - change POSIX mode bits to match the given ACE setting
The default value is false.
-setinherit
Indicates whether or not to allow all files and subdirectories under the specified directory to inherit the ACE of the parent directory. Values can be:
  • true - inherit ACE from parent
  • false - do not inherit ACE from parent
The default value is true.

If true, new files and directories under the parent directory,inherit the ACEs and corresponding POSIX mode bits from the parent directory.

If false, files and directories under the parent directory do not inherit the ACE of the parent directory. Instead, by default, the files and directories get the default ACE, which is the empty string for all access types; POSIX mode bits are set on the files and directories in the traditional way.

<path>
Specifies the path to the file or directory.
-setaudit on|off <dir|file|table> Enables auditing of the specified directory, file, or HPE Ezmeral Data Fabric Database table.

Enabling auditing of a directory does not enable auditing of files and subdirectories that exist in the directory. You must enable auditing on those existing files and subdirectories. However, any new files and subdirectories that you create will automatically be enabled for auditing. See How Does Auditing Work?.

For operations on the object to be logged, auditing also needs to be enabled on the cluster and the volume in which the object is located. See Managing Auditing for details. If auditing is enabled for a directory, new files and directories created within that directory are also enabled for auditing.

-setchunksize <size> <dir> Sets the chunk size in bytes for the directory specified in <dir>. The <size> parameter must be a multiple of 65536.
-setcompression on|off|lzf|lz4|zlib <dirtable> Turns compression on or off on the directory specified in <dir> or on the specified table, and sets the compression type to one of the following if compression is not turned off:
  • on — turns on compression using the default algorithm (LZ4)
  • lzf — turns on compression and sets the algorithm to LZF
  • lz4 — turns on compression and sets the algorithm to LZ4
  • zlib — turns on compression and sets the algorithm to ZLIB
-setnetworkencryption on|off <target> Sets network encryption on or off for the filesystem object defined in <target>. The cluster encrypts network target to or from a file, directory, stream, or data-fabric table with network security enabled.
-stat <path> Displays the statistics for the given file. Only the root user and the MAPR_USER user (user name under which data-fabric services run) have permissions to run this command.

The path is required and specifies the path (to the file) on which to run the command. The output fields for this command are as follows.

tierstatus <file_path> [-v] The status of the offload or recall of the given file. If -v (for verbose) is also specified, for the given file, the command specifies whether data is local or offloaded as the final output. If the file:
  • Contains local data, returns the following final output:
    File has local data
  • Is completely offloaded, returns the following final output:
    File does not have local data
See Output for more information.

Output

When used with the -ls, -lsd, -lso, -lsor, -lsr, or -lss options, hadoop mfs displays information about files and directories. For each file or directory hadoop mfs displays a line of basic information followed by lines listing the chunks that make up the file, in the following format:

{mode} {compression} {encryption} {audit} {diskFlush} {replication} {owner} {group} {size} {date} {chunk size} {name} {chunk} {fid} {host} [{host}...] {chunk} {fid} {host} [{host}...] ...

Volume links are displayed as follows:

{mode} {compression} {encryption} {audit} {diskFlush} {replication} {owner} {group} {size} {date} {chunk size} {name} {chunk} {target volume name} {writability} {fid} -> {fid} [{host}...]

The following table describes the values:

mode

A text string indicating the read, write, and execute permissions for the owner, group, and other permissions. See also Managing Permissions.

compression

  • U: uncompressed
  • L: LZf
  • Z (Uppercase): LZ4
  • z (Lowercase): ZLIB
encryption U: unencrypted; E: encrypted
audit U: disabled; A: enabled
disk flush U:disabled; F:enabled

replication

The replication factor of the file (directories display a dash instead)

owner

The owner of the file or directory

group

The group of the file of directory

size

The size of the file or directory

date

The date the file or directory was last modified

chunk size

The chunk size of the file or directory

name

The name of the file or directory

chunk

The chunk number. The first chunk is a primary chunk labeled "p", a 64K chunk containing the root of the file. Subsequent chunks are numbered in order.

fid

The chunk's file ID, which consists of three parts:

  • The ID of the container where the file is stored
  • The inode of the file within the container
  • An internal version number

For volume links, the first fid is the chunk that stores the volume link itself; the fid after the arrow (->) is the first chunk in the target volume.

host

The host on which the chunk resides. When several hosts are listed, the first host is the first copy of the chunk, while subsequent hosts are replicas.

target volume name

The name of the volume pointed to by a volume link.

writability

Displays whether the volume is writable.

When used with the -lsf <path>option, hadoop mfs displays only the file ID (fid) and the file name of each file in the path.

When used with the -stat <path>option, hadoop mfs displays statistics for the given file. For each file, it displays the following:

Output field Description
uid The user ID of the owner.
The last access time.

The may not always be updated. For more information, see the UpdateTimeInterval entry in volume create.

mtime The last modified time.
nlink The number of hard links.
type The type of the file. Value can be one of:
  • regular
  • directory
  • symlink
  • vollink
  • kvstore
  • device
size The size of the file or directory. Depending on the type of file, it can be the actual size or the number of entries.
mode The UNIX style permission mode bits for the file/directory.
networkencryption The network encryption setting. Determines whether network encryption is enabled for this file.
subtype The subtype for the specified type. The following subtypes are supported for some of the types:
regular
  • FSTRegBucket
  • FSTRegCF
  • FSTRegKeyMap
kvstore
  • FSTKvTable
  • FSTKvTabletMap
  • FSTKvSchema
  • FSTKvTablet
  • FSTKvSegMap
  • FSTKvSpillMap
  • FSTKvKeyMap
  • FSTKvXattr
device
  • FSTDevPipe
  • FSTDevSocket
  • FSTDevBlock
  • FSTDevChar

For all other types, subtypes are not valid.

gid The group ID.
compression The compression setting.
When used with tierstatus, the output varies based on whether or not data is local, was offloaded, or was recalled. The output looks similar to the following if:
  • Data was completely offloaded:
    File does not have local data
  • Data could not be completely offloaded or data was recalled:
    File has local data

Examples

View File Information

The hadoop mfs command is used to view file contents. You can use this command to check if compression is turned off in a directory or mounted volume. For example,

# hadoop mfs -ls /
Found 121 items
vrwxr-xr-x  Z E U U   3 mapr mapr        121 2018-08-10 01:07  268435456 /.rw
	       p mapr.cluster.root writeable 2049.50.131362 -> 2049.16.2  physical19.qa.lab:5660 physical20.qa.lab:5660 physical23.qa.lab:5660 
vrwxr-xr-x  Z E U U   3 root root          1 2018-08-09 19:26  268435456 /ATS-VOL1533867958
	       p ATS-VOL1533867958 default 2049.138.131538 -> 2322.16.2  physical20.qa.lab:5660 physical19.qa.lab:5660 physical22.qa.lab:5660 
vrwxr-xr-x  Z E U U   3 root root          1 2018-08-09 21:31  268435456 /ATS-VOL1533875473
	       p ATS-VOL1533875473 default 2049.190.131642 -> 2685.16.2  physical21.qa.lab:5660 physical27.qa.lab:5660 physical23.qa.lab:5660 
drwxr-xr-x  Z E U U   - root root          1 2018-08-09 18:15  268435456 /ATS-VOLUME-1533863729955
	       p 2049.102.131466  physical19.qa.lab:5660 physical20.qa.lab:5660 physical23.qa.lab:5660 
...

In the preceding example, the letter Z indicates LZ4 compression on the directory; the letter U indicates that the directory is uncompressed. In the following example, the listed item is both uncompressed (first U) and unencrypted (second U).

[root@node1-302 ~]# hadoop mfs -ls /hbase
Found 10 items
drwxr-xr-x  Z E U U   - root root          1 2018-08-09 19:26  268435456 /ATS-VOL1533867958/data1533867963
	       p 2322.32.131374  physical20.qa.lab:5660 physical19.qa.lab:5660 physical22.qa.lab:5660
...

The following example demonstrates the usage of the -lsf option:

[root@vm5 logs]# hadoop mfs -lsf /tmp/
            2050.33.262504	/tmp/hosts1
            2050.32.262502	/tmp/hosts2
            2050.35.393704	/tmp/hosts3

Set ACEs

Example 1: The following command shows how to set separate read, write, and execute permissions (using ACE) on a file:

hadoop mfs -setace -readfile p -writefile 'g:group1&!u:user1' -executefile p /file
When the command shown above runs, the POSIX mode bits for:
  • Read access is set for owner, owning group, and others.
  • Write access is set for none.
  • Execute access is set for owner, owning group, others.
All other POSIX mode bits are set to 0.
Example 2: The following command shows how to set read, write, and execute permissions (using ACE) as a single string on the specified directory and force all files and subdirectories under the specified directory to inherit the parent ACE. ACEs that are not specified will be set to the empty string.
hadoop mfs -setace -aces "rf:u:root,wf:group1&!user1,ef:p,rd:u:m7user1" -setinherit true /dir
When the command shown above runs, the POSIX mode bits for listing the contents (r) of the directory is set for owner/user and all other POSIX mode bits on the directory are set to 0; all new directories under this directory will inherit the parent POSIX mode bits. Also, new files in the directory will inherit the following POSIX mode bits:
  • Read access is set to owner/user.
  • Write access is set to none.
  • Execute access set for others.
All other POSIX mode bits are set to 0.
Example 3: The following command shows how to set permissions (using ACE) as a single string on the specified directory and all the files and subdirectories recursively.
hadoop mfs -setace -R -aces "rf:p,wf:g:group1&!u:user1,ef:p" -preservemodebits true /dir

When the command shown above runs, the POSIX mode bits are not modified to match the ACE setting.

Example 4: The following command shows how to deny a specific type of access, writefile, which was set in the first example above, without removing all other access types associated with the file. The empty string used in the following example will deny write access to all users, roles, and groups.
hadoop mfs -setace -writefile "" -preservemodebits false /file

When the command shown above runs, the POSIX mode bit for writing to the file is set to 0.

Get ACEs

The following command shows the ACEs and POSIX mode bits for the specified file only.
hadoop mfs -getace /m7user1/file1.txt

Output

Path: /m7user1/file1.txt
  readfile: !u:m7user1
  writefile: !u:m7user1
  executefile: !u:m7user1
  mode: ---------

Delete ACEs

The following command deletes all ACEs associated with the specified file and sets the ACE for the file to the empty string.
hadoop mfs -delace /file
The following command deletes all ACEs associated with the specified directory.
hadoop mfs -delace /dir
The following command deletes all ACEs associated with the specified directory and ACEs associated with the files and directories (recursively) below the specified directory.
hadoop mfs -delace -R /dir

Create a Hard Link to File

The following command shows how to create a hard link to the file, file1, using a new name, file2.
# hadoop mfs -lnh /madvol1/file1 /madvol1/file2
Creating Hardlink: /madvol1/file2 -> /madvol1/file1

Retrieve the Number of Hard Links

The following command shows how to retrieve the number of hard links (and other statistics) associated with a given file.
# hadoop mfs -stat /vol1/file1
Path: /vol1/file1
  fid: 23185.32.131232
  uid: root
  gid: root
  atime: 2016-06-29 18:49:03
  mtime: 2016-07-01 18:01:54
  nlink: 2
  type: FTRegular
  subtype: FSTInval
  size: 1024000000
  blocksize: 268435456
  mode: 644
  networkencryption: false
  compression: off
View the status of the offload or recall operation for the file named file2 in volume named vol1:
# hadoop mfs -tierstatus  /vol1/file2
File has local data.
View the status of file named test1 in volume named vol1:
# hadoop mfs -tierstatus /vol1/test1 -v
        FID          Has Local Data
2154.109.1049824        Yes
2172.143.524906         Yes
2172.153.524926         Yes
2172.166.524952         Yes
2172.167.524954         Yes
File has local data.

Tag a file with a security policy:

The following command tags the file /user/root/javax.servlet-3.0.jar with three security policies, namely pci, hippa, and new

hadoop mfs -setsecuritypolicytag pci,hippa,new /user/root/javax.servlet-3.0.jar

Retrieve security policy tags from a file:

hadoop mfs -getsecuritypolicytag /user/root/javax.servlet-3.0.jar
            [hippa, new, pci]

Remove all security policy tags from a file:

hadoop mfs -removeallsecuritypolicytag /user/root/javax.servlet-3.0.jar