LLAP I/O

Hive-MR3 based on Hive 2.x supports LLAP (Low Latency Analytical Processing) I/O which is introduced in Hive 2.x. If a ContainerWorker starts with LLAP I/O enabled, it wraps every HiveInputFormat object with an LlapInputFormat object so as to cache all data read via HiveInputFormat. In conjunction with the ability to execute multiple TaskAttempts concurrently inside a single ContainerWorker, the support for LLAP I/O makes Hive-MR3 functionally equivalent to Hive-LLAP.

By virtual of DaemonTasks already available in MR3, it is very easy to implement LLAP I/O in Hive-MR3. If LLAP I/O is enabled, a ContainerGroup creates an MR3 DaemonTask that is responsible for managing LLAP I/O. When a ContainerWorker starts, a DaemonTaskAttempt is created to initialize the LLAP I/O module. Once initialized, the LLAP I/O module works in the background to serve requests from ordinary TaskAttempts. The following code shows the entire implementation of DaemonTaskAttempts for LLAP I/O in Java (excluding the header section):

public class LLAPDaemonProcessor extends AbstractLogicalIOProcessor {
  public LLAPDaemonProcessor(ProcessorContext context) {
    super(context);
  }

  @Override
  public void initialize() throws IOException {
    Configuration conf = TezUtils.createConfFromUserPayload(getContext().getUserPayload());
    LlapProxy.initializeLlapIo(conf);
  }

  @Override
  public void run(Map<String, LogicalInput> inputs, Map<String, LogicalOutput> outputs) throws Exception {
  }

  @Override
  public void handleEvents(List<Event> arg0) {
  }

  @Override
  public void close() throws IOException {
  }
}

Since the LLAP I/O module does not communicate with anything else, all methods other than initialize() take no action.

Hive-MR3 configures LLAP I/O with exactly the same configuration keys that Hive-LLAP uses:

  • hive.llap.io.enabled specifies whether or not to enable LLAP I/O. If set to true, Hive-MR3 attaches an MR3 DaemonTask for LLAP I/O to the unique ContainerGroup under the all-in-one scheme and the Map ContainerGroup under the per-map-reduce scheme.
  • hive.llap.io.memory.size specifies the size of memory for caching data.
  • hive.llap.io.threadpool.size specifies the number of threads for serving requests in LLAP I/O.

Unlike Hive-LLAP, however, the size of the headroom for Java VM overhead (in megabytes) can be specified explicitly with configuration key hive.mr3.llap.headroom.mb (which is new in Hive-MR3). The following diagram shows the memory composition of ContainerWorkers with LLAP I/O under the all-in-one scheme:

llap.memory

Note that the heap size of Java VM (for -Xmx option) is obtained by multiplying the memory size of all TaskAttempts (e.g., specified with configuration key hive.mr3.all-in-one.containergroup.memory.mb under the all-in-one scheme) with a factor specified with configuration key hive.mr3.container.max.java.heap.fraction. Here are a couple of examples of configuring LLAP I/O when hive.llap.io.enabled is set to true:

  • hive.mr3.all-in-one.containergroup.memory.mb=40960,
    hive.mr3.container.max.java.heap.fraction=1.0f,
    hive.mr3.llap.headroom.mb=8192,
    hive.llap.io.memory.size=32Gb
    Memory for TaskAttempts = 40960MB = 40GB
    ContainerWorker size = 40GB + 8GB + 32GB = 80GB
    Heap size = 40960MB * 1.0 = 40GB
    Memory for Java VM overhead = Headroom size = 8GB
  • hive.mr3.all-in-one.containergroup.memory.mb=40960,
    hive.mr3.container.max.java.heap.fraction=0.8f,
    hive.mr3.llap.headroom.mb=0,
    hive.llap.io.memory.size=40Gb
    Memory for TaskAttempts = 40960MB = 40GB
    ContainerWorker size = 40GB + 0GB + 40GB = 80GB
    Heap size = 40960MB * 0.8 = 32GB
    Memory for Java VM overhead = Memory for TaskAttempts - Heap size = 8GB

In order to use LLAP I/O in Hive-MR3, those jar files for LLAP I/O should be explicitly listed for the configuration key hive.aux.jars.path in hive-site.xml, as shown in the following example:

<property>
  <name>hive.aux.jars.path</name>
  <value>/home/hive/hivejar/apache-hive-2.2.0-bin/lib/hive-llap-common-2.2.0.jar,/home/hive/hivejar/apache-hive-2.2.0-bin/lib/hive-llap-server-2.2.0.jar,/home/hive/hivejar/apache-hive-2.2.0-bin/lib/hive-llap-tez-2.2.0.jar</value>
</property>

Since LLAP I/O in Hive-MR3 does not depend on ZooKeeper, the following configuration keys should be set appropriately in hive-site.xml so that no communication with ZooKeeper can be established.

  • hive.llap.hs2.coordinator.enabled should be set to false.
  • hive.llap.daemon.service.hosts should be set to an empty list.