Installing Hive on MR3
Compiling Hive on MR3
Configuring Hive on MR3
Running the TPC-DS Benchmark
Using the Shuffle Handler
Enabling High Availability
Changing the Logging Configuration
Enabling ACID Transactions
Using User Defined Functions
Integrating Apache Ranger
Accessing from Spark
In Hive on MR3, HiveServer2 runs in either shared session mode or individual session mode.
In shared session mode, HiveServer2 maintains a single session (by creating an MR3Session object) to be shared by all Beeline connections. This session creates a DAGAppMaster to serve all Hive queries submitted through Beeline connections, and DAGs generated from such Hive queries can send their TaskAttempts to any ContainerWorker. As a result, all Beeline connections share the entire pool of ContainerWorkers through a common DAGAppMaster.
In individual session mode, each Beeline connection creates its own session (by creating a new MR3Session object) not to be shared with any other Beeline connection. Each DAGAppMaster maintains its own pool of ContainerWorkers which are not visible to other sessions. As a result, there is no sharing of ContainerWorkers between Beeline connections.
In general, shared session mode achieves a better utilization of computing resources because ContainerWorkers can serve any TaskAttempt from any DAG. The use of a common DAGAppMaster, however, implies that all Hive queries are assigned a fixed priority. In contrast, individual session mode allows each Beeline connection to specify the Yarn queue for its DAGAppMaster, and we can in effect assign a different priority to each Beeline connection (at the cost of lower utilization of computing resources).
Shared session mode is enabled if
hive.server2.mr3.share.session is set to true in
By default, it is set to false and individual session mode is used.