Installing Hive on MR3
Compiling Hive on MR3
Configuring Hive on MR3
Running the TPC-DS Benchmark
Using the Shuffle Handler
Enabling High Availability
Changing the Logging Configuration
Enabling ACID Transactions
Using User Defined Functions
Integrating Apache Ranger
Accessing from Spark
Hive on MR3
Hive can run on top of MR3. In order to exploit new features in MR3 such as running concurrent DAGs in the same ApplicationMaster and sharing containers among DAGs, Hive on MR3 is built on a modified backend of Hive. (The modified backend of Hive is not compatible with Tez.)
Currently Hive on MR3 calls a Tez runtime to execute Hive queries and relies on MR3 for the rest such as scheduling DAGs, creating containers, messaging, authenticating and authorizing users, and so on. For end users, Hive on MR3 is similar to Hive on Tez except for several configuration keys specific to MR3. Thus one can quickly migrate from Hive on Tez to Hive on MR3 without much difficulty.
There are four versions of Hive that run on MR3:
- Hive 1.2.2
- Hive 2.3.5
- Hive 3.1.1
- Hive 4.0.0-SNAPSHOT
Hive 2.1.1 and 2.2.0 also run on MR3, but are not included in the MR3 release. For Tez runtimes, Hive on MR3 uses:
- Tez 0.9.1 runtime with additional patches applied
All the three versions of Hive run on MR3 with Tez 0.9.1 runtime.
In comparison with Hive on Tez, Hive on MR3 generally runs faster for sequential queries by virtue of the simple architectural design of MR3. In particular, it makes a much better utilization of computing resources and thus yields a higher throughput for concurrent queries because MR3 allows concurrent DAGs in the same ApplicationMaster to share containers.