Known Issues

With --hivesrc1 and --hivesrc2:

HiveServer2 shows memory leak because of its use of an old version of Calcite (1.2.0-incubating and 1.10). See CALCITE-1808 which is fixed in Calcite 1.15. (There is no such memory leak with --hivesrc3.)

With --hivesrc1:

  • When multiple TaskAttempts run inside a DAGAppMaster or in a ContainerWorker in Yarn mode, GroupByOperator correctly calculates neither the size of memory assigned to each TaskAttempt nor the size of memory used by a TaskAttempt. As a result, it is hard to predict when GroupByOperator flushes hash tables.

  • On Hadoop 2.9 and 3.1, the module HadoopShimSecure may print a warning message like:

    2018-10-28 14:40:52,998 WARN  [main]: shims.HadoopShimsSecure (Hadoop23Shims.java:startPauseMonitor(222)) - Could not initiate the JvmPauseMonitor thread. GCs and Pauses may not be warned upon.
    java.lang.NoSuchMethodError: org.apache.hadoop.util.JvmPauseMonitor.<init>(Lorg/apache/hadoop/conf/Configuration;)V
        at org.apache.hadoop.hive.shims.Hadoop23Shims.startPauseMonitor(Hadoop23Shims.java:218)
        ...
    

    This warning message may be ignored.

With --hivesrc2 and --hivesrc3:

  • When multiple TaskAttempts run inside a DAGAppMaster, GroupByOperator conservatively estimates the size of memory used by a TaskAttempt. As a result, GroupByOperator flushes hash tables more often than necessary. The user can mitigate this issue by increasing the value for the configuration key hive.map.aggr.hash.force.flush.memory.threshold in hive-site.xml.