LLAP I/O

Hive 2.x and 3.x running on top of MR3 support LLAP (Low Latency Analytical Processing) I/O. If a ContainerWorker starts with LLAP I/O enabled, it wraps every HiveInputFormat object with an LlapInputFormat object so as to cache all data read via HiveInputFormat. In conjunction with the ability to execute multiple TaskAttempts concurrently inside a single ContainerWorker, the support for LLAP I/O makes Hive on MR3 functionally equivalent to Hive-LLAP.

By virtue of DaemonTasks already available in MR3, it is very easy to implement LLAP I/O in Hive on MR3. If LLAP I/O is enabled, a ContainerGroup creates an MR3 DaemonTask that is responsible for managing LLAP I/O. When a ContainerWorker starts, a DaemonTaskAttempt is created to initialize the LLAP I/O module. Once initialized, the LLAP I/O module works in the background to serve requests from ordinary TaskAttempts. The following code shows the entire implementation of DaemonTaskAttempts for LLAP I/O in Java (excluding the header section):

public class LLAPDaemonProcessor extends AbstractLogicalIOProcessor {
  public LLAPDaemonProcessor(ProcessorContext context) {
    super(context);
  }

  @Override
  public void initialize() throws IOException {
    Configuration conf = TezUtils.createConfFromUserPayload(getContext().getUserPayload());
    LlapProxy.initializeLlapIo(conf);
  }

  @Override
  public void run(Map<String, LogicalInput> inputs, Map<String, LogicalOutput> outputs) throws Exception {
  }

  @Override
  public void handleEvents(List<Event> arg0) {
  }

  @Override
  public void close() throws IOException {
  }
}

Since the LLAP I/O module does not communicate with anything else, all methods other than initialize() take no action.

Hive on MR3 configures LLAP I/O with exactly the same configuration keys that Hive-LLAP uses:

  • hive.llap.io.enabled specifies whether or not to enable LLAP I/O. If set to true, Hive attaches an MR3 DaemonTask for LLAP I/O to the unique ContainerGroup under the all-in-one scheme and the Map ContainerGroup under the per-map-reduce scheme.
  • hive.llap.io.memory.size specifies the size of memory for caching data.
  • hive.llap.io.threadpool.size specifies the number of threads for serving requests in LLAP I/O.

Unlike Hive-LLAP, however, the size of the headroom for Java VM overhead (in megabytes) can be specified explicitly with configuration key hive.mr3.llap.headroom.mb (which is new in Hive on MR3). The following diagram shows the memory composition of ContainerWorkers with LLAP I/O under the all-in-one scheme:

llap.memory

Note that the heap size of Java VM (for -Xmx option) is obtained by multiplying the memory size of all TaskAttempts (e.g., specified with configuration key hive.mr3.all-in-one.containergroup.memory.mb under the all-in-one scheme) with a factor specified with configuration key hive.mr3.container.max.java.heap.fraction. Here are a couple of examples of configuring LLAP I/O when hive.llap.io.enabled is set to true:

  • hive.mr3.all-in-one.containergroup.memory.mb=40960,
    hive.mr3.container.max.java.heap.fraction=1.0f,
    hive.mr3.llap.headroom.mb=8192,
    hive.llap.io.memory.size=32Gb
    Memory for TaskAttempts = 40960MB = 40GB
    ContainerWorker size = 40GB + 8GB + 32GB = 80GB
    Heap size = 40960MB * 1.0 = 40GB
    Memory for Java VM overhead = Headroom size = 8GB
  • hive.mr3.all-in-one.containergroup.memory.mb=40960,
    hive.mr3.container.max.java.heap.fraction=0.8f,
    hive.mr3.llap.headroom.mb=0,
    hive.llap.io.memory.size=40Gb
    Memory for TaskAttempts = 40960MB = 40GB
    ContainerWorker size = 40GB + 0GB + 40GB = 80GB
    Heap size = 40960MB * 0.8 = 32GB
    Memory for Java VM overhead = Memory for TaskAttempts - Heap size = 8GB

In order to use LLAP I/O in Hive on MR3, those jar files for LLAP I/O should be explicitly listed for the configuration key hive.aux.jars.path in hive-site.xml, as shown in the following example:

<property>
  <name>hive.aux.jars.path</name>
  <value>/home/hive/hivejar/apache-hive-2.3.3-bin/lib/hive-llap-common-2.3.3.jar,/home/hive/hivejar/apache-hive-2.3.3-bin/lib/hive-llap-server-2.3.3.jar,/home/hive/hivejar/apache-hive-2.3.3-bin/lib/hive-llap-tez-2.3.3.jar</value>
</property>

Since LLAP I/O in Hive on MR3 does not depend on ZooKeeper, the following configuration keys should be set appropriately in hive-site.xml so that no communication with ZooKeeper can be established.

  • hive.llap.hs2.coordinator.enabled should be set to false.
  • hive.llap.daemon.service.hosts should be set to an empty list.