The CLDB DiskBalancer thread monopolized the LoadTracker lock, resulting in delayed heartbeat processing and slow heartbeat alarms.
In every heartbeat from fileservers, CLDB receives the list of storage pools and the associated usage information. The heartbeat processing thread then puts the storage pools into different bins/buckets based on their utilization. The bucketing is used by disk balancer and the replication manager to pick a good storage pool to create a replica on. However, all of them take the LoadTracker object lock. This can potentially delay heartbeat processing by as long as 10 seconds.
|With this fix, the bucketing is now moved out of heartbeat processing code as a delayed operation by queuing a task to a separate thread pool. To avoid queuing a task for every heartbeat, a delayed task is checked to see if it has already been scheduled for a fileserver. If so, the task is not queued and bucketing is allowed to complete. When the delayed task is done, another delayed task, to bucket the storage pool, will be scheduled during a subsequent heartbeat.|
|MapR-CLDB||23488||When a volume with a replication factor below 3 and no minimum number of replication was created, the replication manager set the minimum replication value to 1 and containers with only one copy were not re-replicated.||With this fix, by default, the minimum number of copies is 1 if the replication factor is less than or equal to 2.|
|MapR-CLDB||23725||Concurrent access to volume ACEs without lock was spinning CPU.||With this fix, concurrent access to volume ACEs will no longer spin CPU.|
|MapR-CLDB||24008||On a NFS node, CLDB was trying to get the port from the IP list of the server that did not send any IPs and this caused CLDB to crash.||
With this fix, CLDB will no longer crash as it won't try to get the port.
|MapR-DB||22924||MapR-DB client applications calling the hb_get_add_column() C function were required to use qualifier names in lexicographic order when creating more than one column in a column family.||With this fix, the requirement is removed.|
User credentials were not set explicitly whenever a client application accessed a MapR-DB table, which caused an EACCES error in the following type of situation:
For connecting to a single MapR cluster, a client application written in C creates two connection objects (connA and connB), using separate user credentials (for userA and userB) for each connection object. A single application thread is used for table operations with both connA and connB. This thread performs these operations:
1. As userA, the thread creates table X via connA. The MapR dbclient, which mediates connections between client applications and MapR clusters running MapR-DB, caches the credentials for userA in thread local storage.
2. As userB, the thread deletes table Y via connB. The dbclient overwrites userA's credentials in thread local storage with the credentials for userB.
3. As userA, the thread attempts a put operation on table X. Before attempting to access the table with userA's credentials, the dbclient does not first overwrite the credentials in thread local storage. Because the stored credentials are for userB, the attempt to access the table fails.
|With this fix, the dbclient overwrites the credentials in thread local storage with the current user's credentials before attempting to access tables.|
When a client application written in C successfully deleted a table in MapR-DB, error messages such as the following would be logged:
2016-04-27 14:30:58,0688 ERROR Client fs/client/fileclient/cc/client.cc:2372
Thread: 1120 Unlink failed for file /user/temp/test.table, error Invalid
2016-04-27 14:30:58,0688 ERROR Inode fs/client/fileclient/cc/inode.cc:485
Thread: 1120 Unlink failed on file /user/temp/test.table with error 22
|With this fix, such error messages are no longer logged.|
If a table was a source in table replication, an incremental bulk load of one or more non-replicated column families would cause the mfs service to core.
For example, suppose a source table contained the column families cf1, cf2, and cf3. Only cf3 was being replicated. If an incremental bulk load was started for cf1 and cf2, the mfs service cored.
|With this fix, the mfs service no longer cores in this type of situation.|
|MapR-DB||23312||The duration specified while generating cross-cluster tickets was not being set and the default duration of 14 days was being applied instead.||
With this fix, for:
· Admin generated cross-cluster and service tickets:
The default duration is now LIFETIME.
Note: Service and cross-cluster tickets are now not bounded by the CLDB duration properties.
· Password authenticated tickets, if duration is:
The maprlogin print command will now print ticket of any type.
Specified, the specified duration will be honored.
Not specified, the default duration configured as CLDB properties will be used.
|MapR-DB||23382||CLDB fails over with an exception when a node with stale containers is removed.||With this fix, a node with stale containers can be removed successfully from the cluster and CLDB exceptions are not thrown.|
In this type of situation in MapR-DB, the first of a series of puts for a row would succeed, while the remaining puts in the series would fail without errors:
1. A tablet T is split into T1 and T. The dbclient still has tablet T cached with the original key range.
2. The dbclient issues a series of puts against a rowkey that used to be in T, but which is now in T1.
3. The server returns an ERANGE error for the first put, but not for the remaining puts in the series.
4. The dbclient retries the first put and succeeds, but does not retry the remaining puts because the dbclient never received the ERANGE error for those puts.
This problem could occur for different types of errors that applied to all of the puts issued together for a single row.
|With this fix, the server returns the relevant error message for all of the puts in a series for a single row.|
|Mapr-DB||23541||A ddlopen of libmapr_pam.so using immediate symbol resolution throws an undefined symbol error.||An updated libmapr_pam.so has been provided that links to libpam.so.|
For the binary table, if the size of one row is larger than what is
specified by "mfs.db.max.rowsize.kb", the scan operation will hang there
without any exception thrown out.
|If the scan request fails, the client stops retrying and throws an exception.|
|MapR-FileClient||23303||In C client applications for MapR-FS, calling hdfsConnectAsUserNewInstance() with an invalid user and then calling the same function call with a valid user caused mfs to core.||With this fix, when all further calls with a valid user is made, the memory corruption (because of failed user resolution during the first call) will not result in core dump.|
|MapR-FileClient||23687||When a user tried to use hdfsOpenFile(), hdfsWrite(), hdfsFlush(), and/or hdfsCloseFile() on a file on which the user did not have the right permissions for the operation, 0 was returned instead of the right error code.||With this fix, when hdfsOpenFile(), hdfsWrite(), hdfsFlush(), and/or hdfsCloseFile() is used by a user without the right permissions for the operation on the file, the operation will fail and the appropriate error code will be returned.|
|MapR-FileClient||23715||The MFS C and Java APIs did not return the requested number of bytes.||With this fix, both C and Java APIs will return the requested number of bytes if present.|
On an unsecure cluster, C client applications were able to impersonate even when the:
· MAPR_IMPERSONATION_ENABLED environment variable was not set to true
· Impersonating user did not have a file under /opt/mapr/conf/proxy
|With this fix, to enable impersonation on an unsecure cluster for C client applications, the MAPR_IMPERSONATION_ENABLED environment variable must be set to true and the impersonating user must have a file under /opt/mapr/conf/proxy.|
|MapR-FileClient||23762||When compiling code with hdfsExists2(), an error was returned as hdfs.h did not expose hdfsExists2().||The hdfs.h file has been updated to include hdfsExists2().|
||With this fix, the command will not loop its output indefinitely.|
In the scenario where Fastfailover is enabled and star replica is implemented,
(for example, with a star replica chain of A-B A-C), if B goes down, the error causes the replication operation to not send the replica operation to C.
|This issue is fixed by not calling the response inline.|
|MapR-FileServer||23944||In some cases, when a local write times out with ETIMEDOUT error, the NFS server re-uses shared pages, before mfs releases those pages, resulting in mfs crash.||With this fix, on ETIMEDOUT error for local writes in NFS server, NFS server will not reuse those pages.|
|MapR-FS||22491||A typographical error in log messages prevented MapR support from accurately confirming that container resync errors were causing mirroring failures.||With this fix, the typographical errors are corrected.|
In situations where a client application looped between creating and deleting the same MapR-DB table, either of the following two circumstances could lead to a fileserver deadlock, preventing any other MapR filesystem operations in the volume hosting the table:
The creation of a snapshot of the volume was triggered.
A node hosting one of the containers of the table data failed.
|With this fix, fileserver deadlocks are no longer possible in these situations.|
|MapR-FS||22698||MFS crashed because of a race condition between evaluating a volume ACE and refreshing the volume ACE.||With this fix, MFS will no longer crash as updating the volume properties in schema happens in DB thread only (thus preventing a race condition between MFS and DB threads).|
|MapR-FS||22829||On installations with two MFS instances per node on (SSD-based) clusters, instead of assigning one license per node, MFS was using one license per instance. For example, a 10 node cluster with SSDs required 20 licenses instead of 10 licenses.||With this fix, MFS will no longer assign multiple licenses to a node if more than one MFS instance is running on that node.|
|MapR-FS||22860||Client applications holding two or more connections to the server could experience RPC timeouts in the following type of situation: After one connection establishes a session key with the server, all of the connections remain idle long enough to trigger a session key renewal on the server. Two or more requests are then sent in parallel on different connections. The first request processed on the server triggers a change of the previous session key to the new session key. The remaining requests subsequently reaching the server on the other connections have the old session key, rather than the new session key.||With this fix, the requests with the old session key are now discarded by the server and the client retransmits the requests with the new session key after a timeout that generally lasts from one to two minutes.|
|MapR-FS||22873||Occasionally, writes to MapR-FS by C client applications were not visible immediately after a flush. Therefore, these writes appeared to be missing.||With this fix, the data inconsistency issue is resolved.|
|MapR-FS||22881||When mirroring was started for a volume, a new container, if not present, was created for each container in the source volume and the new containers were deleted if the mirroring was stopped. While deleting the new containers, the volume mirror module missed the last container in each iteration because the volume mirror module was incrementing the start key container ID (CID) during each iteration.||With this fix, the volume mirror module will query the list of containers without missing a container and delete them.|
|MapR-FS||22883||The MFS did not handle different formats in the CPU list for setting affinity on NUMA nodes. The MFS was only handling consecutive hyphen separated bit ranges. For example, MFS would only handle “0-3,8-11” and would not handle “0,4,8,12”.||With this fix, MFS will now support both comma and hyphen separated CPU list for setting NUMA affinity.|
|MapR-FS||22898||On nodes with more than one MFS instance, if the node topology changed, the local volume topology did not change and the topology of the local volume continued to show the old topology. For example, on a node with 2 MFS instances, if the node topology was pointing to /data/default-rack/nodeIP and then changed to /data/rack/nodeIP, the local topology continued to point to /data/default-rack/nodeIP instead of /data/rack/nodeIP.||With this fix, on nodes with more than one MFS instance, local volume topology will change if the node topology changes.|
|MapR-FS||22948||The mfs service would terminate unexpectedly in the following situation: the primary disk in a storage pool was removed, thereby removing the metadata stored on that disk about the storage pool; an attempt was then made to add a secondary disk to the storage pool.||With this fix, the mfs service no longer terminated unexpectedly in this situation.|
|MapR-FS||22949||Occasionally, C client applications connected to MapR-FS experienced short reads during concurrent reads or concurrent reads and writes.||With this fix, the short read issue is resolved.|
|MapR-FS||23032||For a volume with a replication factor of 3 on a topology of 3 nodes, sometimes all master containers are created on same node. This skew in master containers causes drop in performance.||With this fix, the containers will now be distributed across all nodes.|
|MapR-FS||23131||The NFS server running on 5.1 cluster was hanging when it tried to access a cluster running older versions. The new features introduced in 5.1 are not available on older releases and the NFS server, instead of returning an error, was hanging while attempting to access the cluster running the older version.||With this fix, the NFS server will return a "feature not present" error instead of hanging.|
|MapR-FS||23186||When a MapR-FS client application written in C called hdfsDisconnect(), the corresponding file system handle was not deleted, resulting in memory leaks.||With this fix, file handles are now deleted when hdfsDisconnect() is called.|
Running the ls command from a valid (working) cluster returned the following error when the clusters.conf file included an invalid (non-working) cluster entry:
ls: cannot open directory .: No such file or directory
|With this fix, the ls command no longer returns the aforementioned error if the clusters.conf file includes an invalid (non-working) cluster entry.|
|MapR-FS||23315||Many GetXAttr (get extended attribute) calls (for file ACEs) were made on the NFS mount irrespective of the file type. This resulted in a lot of GetXAttr calls on the NFS mount for normal file operations.||With this fix, GetXAttr calls will only be made for a special file (.dfs_attributes).|
|MapR-FS||23331||When re-reading large files, some cache misses were seen despite warm cache.||With this fix, re-reading large files will not result in cache misses.|
|MapR-FS||23629||While allocating large number of inodes during resynchronization of containers, the source container would timeout if destination container did not respond within 5 minutes.||With this fix, instead of sending large number of inodes during resynchronization, multiple commands with a fixed number of inodes per command will be sent to allocate the required number of inodes.|
|MapR-FS||23676||When File ACEs (Access Control Expressions) were set on a symbolic link, flags were set to indicate aces were set on the symlink file itself.||With this fix, when hdfsOpenFile(), hdfsWrite(), hdfsFlush(), and/or hdfsCloseFile() is used by a user without the right permissions for the operation on the file, the operation will fail and the appropriate error code will be returned.|
|MapR-FS||23795||Some storage pools were going offline frequently with CRC errors because:
||With this fix:
|MapR-Streams||23178||Consumer applications reading messages from topics in MapR Streams generate globally unique identifiers (GUIDs) that the server can use to identify individual consumers. Consumers running on OS X could occasionally generate GUIDs in formats that the server would not recognize.||With this fix, consumers running on OS X always generate GUIDs in the correct format.|
|Installation and Configuration||23966||The Hadoop 2.7.0 shared library did not include the correct Apache Avro library.||With this fix, the avro-1.7.6.jar file is included in the Hadoop 2.7.0 shared library.|
|Installation and Configuration||23770||/opt/mapr/conf/env.sh overrides the LD_PRELOAD environment variable.||
/opt/mapr/conf/env.sh no longer resets the LD_PRELOAD environment variable. If
LD_PRELOAD is already configured, env.sh appends the libpam library path to the
|Installation and Configuration||23459||When Warden started before its dependent services were available, Warden did not start MapR services such as MapR-FS.||With this fix, Warden will start MapR services only after its dependent services such as dns, multi-user.target, and su have started on the node where Warden runs.|
|Installation and Configuration||22941||Warden failed to start Oozie on clusters where the MAPR_USER was not defined.||
With this fix, Warden successfully starts Oozie after configure.sh -R is run on the node. When you run configure.sh -R to start Oozie, it also sets the appropriate permissions on the Oozie directory.
|MapR Build||23950||The Kafka client jar 0.9.0.0-mapr-1602-streams-5.1.0 currently points to com.mapr.streams mapr-streams 5.1.0-mapr.||A new kafka-client artifact is published which pulls the latest MapR-Stream EBF jar.|
|Package/Deployment||22978||If posix-client packages (mapr-posix-client-basic/platinum) were installed on both a MapR core cluster and a client node, the mapr-patch-posix-client patch did not work as expected.||With this fix, the posix client patch can be applied to both the MapR cluster and the client node.|
|YARN||22808||The calculation of the preemption utilization threshold of the Fair Scheduler's Dominant Resource Fairness (drf) scheduling policy did not consider disk usage as a resource. Instead, the preemption utilization threshold was calculated based on memory and CPU alone.||With this fix, the drf scheduling policy considers memory, CPU, and disk usage when allocating resources to applications. For example, because MapReduce jobs require disk resources, preemption will now occur when the disk resources are at capacity.|
|YARN||23745||On a secure cluster, Pig jobs failed because zero-configuration Resource Manager HA did not handle the case where the filesystem set in the job configuration object is not the MapR-FS.||With this fix, zero-configuration Resource Manager HA now handles the case where the filesystem set in the job configuration object is not the MapR-FS.|
|YARN||23791||The ResourceManager UI does not show details about task attempts. Instead, it only shows that a task is "Processing..." until the job completes.||With this fix, the task attempts page works as expected.|