Hive-MR3 is an extension of Hive that runs on top of MR3. In order to exploit new features in MR3 such as running concurrent DAGs in the same ApplicationMaster and sharing containers among DAGs, Hive-MR3 is built on a modified backend of Hive. The modified backend of Hive is also compatible with Tez, so the user can readily switch to Hive-on-Tez without installing it separately.
Currently Hive-MR3 calls a Tez runtime to execute Hive queries and relies on MR3 for the rest such as scheduling DAGs, creating containers, messaging, authenticating and authorizing users, and so on. For end users, Hive-MR3 is similar to Hive-on-Tez except for several configuration keys specific to Hive-MR3. Thus one can quickly migrate from Hive-on-Tez to Hive-MR3 without much difficulty.
There are four versions of Hive-MR3 which are based on:
- Hive 1.2.2
- Hive 2.1.1
- Hive 2.2.0
- Hive 2.3.3
For Tez runtimes, Hive-MR3 supports:
- Tez 0.7.0 runtime
- Tez 0.8.4 runtime
- Tez 0.9.1 runtime
Hive-MR3 based on Hive 1.2.2 can run with any of these runtime environments, whereas Hive-MR3 based on Hive 2.x can run only with Tez 0.8.4 or 0.9.1 runtime.
In comparison with Hive-on-Tez, Hive-MR3 generally runs faster for sequential queries by virtue of the simple architectural design of MR3. In particular, it makes a much better utilization of computing resources and thus yields a higher throughput for concurrent queries because MR3 allows concurrent DAGs in the same ApplicationMaster to share containers.