Troubleshooting Hive and Tez

This section includes Hive and Tez troubleshooting tips.

HDFS Literal Deprecated

Starting in Hive-2.3, the hdfs literal is deprecated. Specifying a table location using the hdfs URI scheme will cause queries to fail because the Hive parser recognizes the hdfs literal in the LOCATION key word and triggers HDFS encryption, which is not supported.

If you use the hdfs literal with the LOCATION keyword in Hive queries:
CREATE TABLE IF NOT EXISTS i (id INT) LOCATION 'hdfs:///i';
The system logs the following warning:
LOG.warn("hdfs:// is deprecated filesystem and will be removed in future releases. Use maprfs://
        instead");
To avoid hdfs literal issues, update all instances of hdfs with maprfs in tables, partitions, and databases. Also update the hive-site.xml file to remove hdfs from the URI scheme list.
Update hive-site.xml
Remove hdfs from the hive.exim.uri.scheme.whitelist Hive configuration property in hive-site.xml, as shown:
<property>
	<name>hive.exim.uri.scheme.whitelist</name>
	<value>maprfs,…,…,s3</value>
</property>
Update Tables and Partitions
To replace the table and partition location with maprfs, run:
MariaDB [hive]>  update SDS set LOCATION = REPLACE(LOCATION, 'hdfs', 'maprfs') where LOCATION like '%hdfs%';
Update Databases
To replace the database location with maprfs, run:
MariaDB [hive]> update DBS set DB_LOCATION_URI = REPLACE(DB_LOCATION_URI, 'hdfs', 'maprfs') where DB_LOCATION_URI like '%hdfs%';

Prohibited usage of datanucleus.schema.autoCreateAll property

The usage of the datanucleus.schema.autoCreateAll property is prohibited in all cases. Instead of using this property, you must run the schematool command. Refer to HIVE-21302 for more information.

WebHCat

Secure WebHCat operations depend on the Hive metastore having Kerberos enabled. If Kerberos is not enabled for the Hive metastore, null pointer exceptions similar to the following will appear:

2013-10-06 20:38:55,198 ERROR metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(134)) - MetaException(message:java.lang.NullPointerException)
       at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_delegation_token(HiveMetaStore.java:3972)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
       at com.sun.proxy.$Proxy5.get_delegation_token(Unknown Source)
       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_delegation_token.getResult(ThriftHiveMetastore.java:8063)
       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_delegation_token.getResult(ThriftHiveMetastore.java:8047)
       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
       at org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48)
       at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
If you are updating to the mapr-hive-2.3.6-mapr-1912 (EEP-6.3.0) package, you should manually replace the old webhcat-default.xml configuration file with the new one and restart the WebHCat service:
cp /opt/mapr/hive/hive-2.3/hcatalog/etc/webhcat.new/webhcat-default.xml /opt/mapr/hive/hive-2.3/hcatalog/etc/webhcat/

Hive in an Azure Cluster

When Hive services are installed on an Azure cluster, it is possible that the services will not start because Azure assigns too long (over 64 symbols) host names. Perform following steps to fix this issue:

NOTE This issue is fixed on MapR core 6.0.1 starting from build 20180320175756.GA-1.x86_64.
  1. Edit the /etc/hosts file:
    nano /etc/hosts
    This is an example of a /etc/hosts file for an Azure cluster:
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    172.24.8.4 anaikregtestc73522602-cluster-com-mapr-vm0.izqafobxqxbuzkv4led1p3snic.dx.internal.cloudapp.net anaikregtestc73522602-cluster-com-mapr-vm0
    172.24.8.5 anaikregtestc73522602-cluster-com-mapr-vm1.izqafobxqxbuzkv4led1p3snic.dx.internal.cloudapp.net anaikregtestc73522602-cluster-com-mapr-vm1
    172.24.8.6 anaikregtestc73522602-cluster-com-mapr-vm2.izqafobxqxbuzkv4led1p3snic.dx.internal.cloudapp.net anaikregtestc73522602-cluster-com-mapr-vm2
    172.24.8.7 anaikregtestc73522602-cluster-com-mapr-vm3.izqafobxqxbuzkv4led1p3snic.dx.internal.cloudapp.net anaikregtestc73522602-cluster-com-mapr-vm3
    172.24.8.8 anaikregtestc73522602-cluster-com-mapr-vm4.izqafobxqxbuzkv4led1p3snic.dx.internal.cloudapp.net anaikregtestc73522602-cluster-com-mapr-vm4
  2. Add a short alias for each node:
    NOTE You can use any short alias. In this example, vm0, vm1, vm2, vm3, and vm4 are used:
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    172.24.8.4 anaikregtestc73522602-cluster-com-mapr-vm0.izqafobxqxbuzkv4led1p3snic.dx.internal.cloudapp.net anaikregtestc73522602-cluster-com-mapr-vm0 vm0
    172.24.8.5 anaikregtestc73522602-cluster-com-mapr-vm1.izqafobxqxbuzkv4led1p3snic.dx.internal.cloudapp.net anaikregtestc73522602-cluster-com-mapr-vm1 vm1
    172.24.8.6 anaikregtestc73522602-cluster-com-mapr-vm2.izqafobxqxbuzkv4led1p3snic.dx.internal.cloudapp.net anaikregtestc73522602-cluster-com-mapr-vm2 vm2
    172.24.8.7 anaikregtestc73522602-cluster-com-mapr-vm3.izqafobxqxbuzkv4led1p3snic.dx.internal.cloudapp.net anaikregtestc73522602-cluster-com-mapr-vm3 vm3
    172.24.8.8 anaikregtestc73522602-cluster-com-mapr-vm4.izqafobxqxbuzkv4led1p3snic.dx.internal.cloudapp.net anaikregtestc73522602-cluster-com-mapr-vm4 vm4
  3. Perform step 1 and 2 for each node in the cluster.

Tez Upgrade Issues

  • Preserving configuration on Ubuntu is not supported from EEP 4.1.1 and EEP 5.0.0 (1803) to EEP 6.0.0 (1808) or EEP 5.0.1 (1808).
  • Preserving Tomcat configuration is not supported from any previous EEP to EEP 6.0.0 (1808).
  • You must manually stop the Tomcat service and delete the Tomcat folder as a precondition if you are updating or upgrading Tez from the following EEPs:
    • EEP 4.0.0
    • EEP 4.1.0

User Names, Group Names, and LDAP

LDAP configuration allows you to use group names and usernames with spaces, so it is possible to name groups with spaces in them, for example, domain users. The following structure is possible in the MapR FileSystem:
drwxr-xr-x   - afischer      domain users          0 2018-10-03 16:10 /user/abc
drwxr-xr-x   - mapr          mapr                  0 2018-10-05 16:51 /user/def
drwxr-xr-x   - dschexnayder  domain users          8 2018-10-10 13:30 /user/xyz
drwxr-xr-x   - mapr          mapr                  1 2018-10-09 14:23 /user/hive
drwxr-xr-x   - mapr          mapr                 11 2018-10-10 01:56 /user/mapr
drwxr-xr-x   - mlitovsky     domain users          0 2018-10-06 11:08 /user/hjbs
drwxr-xr-x   - pcurtis       domain users          5 2018-10-04 19:33 /user/jknd
drwxr-xr-x   - mapr          mapr                  3 2018-10-08 16:29 /user/ewkd
drwxr-xr-x   - talvarez      domain users          0 2018-10-04 17:02 /user/lkd
According to HADOOP-12505, the Hadoop community does not allow spaces in group names, and because of that so does Hive. Each time you perform a query in Hive on a group name that has a space, you will see the following exception:
-chgrp: 'domain users' does not match expected pattern for group

The workaround is to not use spaces in group names or user names.

HiveServer 2 takes time to start because of get_all_databases

Materialized view registry and cache is introduced in HIVE-14496 for Hive 2.3.0.

The goal of the cache is to avoid parsing and creating logical plans for the materialized views at query runtime. When a query arrives, you need to consult this cache and extract the logical plans for the views (which are already parsed) from it. Materialized view registry class scans all databases and tables in each database during initialization and that may cause long time to start HiveServer2.

Property hive.materializedview.enable.views.registry is added to control the usage of materialized view registry:

Property: hive.materializedview.enable.views.registry

Default value: true

Description: In case of a large amount of databases and tables in Hive, usage of materialized view registry and cache force HiveServer2 to scan all of them in order to cache the query plan for a view. This leads to an extremely long time for HiveServer2 to start.

This property is used to disable view registry and cache for this case. To disable materialized view registry and cache, add the following to hive-site.xml and restart Hive services.
<property>
  <name>hive.materializedview.enable.views.registry</name>
  <value>false</value>
</property>

Database and Table Names Containing a Dot (.)

HIVE-16907 rejects queries with database and table names that contain a dot (.), and this behavior is backported to Hive 2.3.

Databases and tables that contain a dot (.) in the name are not supported now. For example:
{code}
insert into `tdb.t1` select * from t2;
{code}
Throws error:
{code}
FAILED: SemanticException org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:12 Table or database name may not contain dot(.) character 'tdb.t1'
{code}

Avoid using unsupported characters in database and table names.