Running HiveServer2

For running HiveServer2, the user should use HiveServer2 included in the MR3 release. Any client program (not necessarily those included in the MR3 release), however, can be used to connect to HiveServer2. In a multi-user environment, the administrator user (e.g., hive) typically starts HiveServer2.

In order to run HiveServer2, set the following environment variables in env.sh as necessary:

HIVE1_SERVER2_HOST=$HOSTNAME
HIVE1_SERVER2_PORT=9812

HIVE2_SERVER2_HOST=$HOSTNAME
HIVE2_SERVER2_PORT=9822

HIVE5_SERVER2_HOST=$HOSTNAME
HIVE5_SERVER2_PORT=9852

HIVE_SERVER2_HEAPSIZE=16384

HIVE_SERVER2_AUTHENTICATION=NONE
HIVE_SERVER2_KERBEROS_PRINCIPAL=gitlab-runner/_HOST@RED
HIVE_SERVER2_KERBEROS_KEYTAB=/home/gitlab-runner/gitlab-runner.keytab

Note that env.sh specifies a HiveServer2 address (host and port) for each version of Hive because of the incompatibility between different versions of HiveServer2.

  • HIVE_SERVER2_HEAPSIZE specifies the heap size (in megabytes) for HiveServer2.
  • HIVE_SERVER2_AUTHENTICATION specifies the authentication option for HiveServer2: NONE, NOSASL, KERBEROS, LDAP, PAM, and CUSTOM. It corresponds to configuration key hive.server2.authentication in hive-site.xml.
  • HIVE_SERVER2_KERBEROS_PRINCIPAL and HIVE_SERVER2_KERBEROS_KEYTAB specify the principal and keytab file for HiveServer2, and correspond to configuration keys hive.server2.authentication.kerberos.principal and hive.server2.authentication.kerberos.keytab in hive-site.xml.

In order to start HiveServer2, execute hive/hiveserver2-service.sh with the following options:

start                     # Start HiveServer2 on port defined in HIVE?_SERVER2_PORT.
stop                      # Stop HiveServer2 on port defined in HIVE?_SERVER2_PORT.
restart                   # Restart HiveServer2 on port defined in HIVE?_SERVER2_PORT.
--local                   # Run jobs with configurations in conf/local/.
--cluster                 # Run jobs with configurations in conf/cluster/ (default).
--mysql                   # Run jobs with configurations in conf/mysql/.
--tpcds                   # Run jobs with configurations in conf/tpcds/.
--hivesrc1                # Choose hive1-mr3 (based on Hive 1.2.2) (default).
--hivesrc2                # Choose hive2-mr3 (based on Hive 2.3.3).
--hivesrc5                # Choose hive5-mr3 (based on Hive 3.0.0).
--tezsrc1                 # Choose tez1-mr3 (based on Tez 0.7.0) (default).
--tezsrc3                 # Choose tez3-mr3 (based on Tez 0.9.1).
--amprocess               # Run the MR3 DAGAppMaster in LocalProcess mode.
--hiveconf <key>=<value>  # Add a configuration key/value.
<HiveServer2 option>      # Add a HiveServer2 option.
  • With --amprocess, HiveServer2 runs every MR3 DAGAppMaster in LocalProcess mode. Hence, each time HiveServer2 starts a DAGAppMaster, it creates a new process on the same machine. In the case of HiveServer2 running in shared session mode, it creates such a new process immediately. (Currently --amprocess cannot be used in a secure cluster with Kerberos; see the documentation on LocalProcess mode.)
  • The user can append as many HiveServer2 options (for the command hive --service hiveserver2 from Hive) as necessary to the command.

When executing hive/hiveserver2-service.sh, it is best to reuse the same option used for hive/metastore-service.sh. For example, in order to connect to Metastore started with --mysql --hivesrc2, it is best to execute hive/hiveserver2-service.sh with the same option. Otherwise mismatches in the Hive version and configuration values may lead to erroneous cases that are hard to diagnose.

Here are a few examples of running the script:

# start HiveServer2 that starts a new DAGAppMaster for each Beeline connection
hive/hiveserver2-service.sh start --local --hivesrc2 --tezsrc3 --hiveconf hive.server2.mr3.share.session=false

# start HiveServer2 that starts a common DAGAppMaster for all Beeline connections
hive/hiveserver2-service.sh start --local --hivesrc2 --tezsrc3 --hiveconf hive.server2.mr3.share.session=true

# start HiveServer2 that starts a common DAGAppMaster in LocalProcess mode 
hive/hiveserver2-service.sh start --mysql --hivesrc2 --tezsrc3 --amprocess --hiveconf hive.server2.mr3.share.session=true

Executing hive/hiveserver2-service.sh creates a new directory under hive/hiveserver2-service-result:

hive-mr3--2018-03-12--17-07-13-babdc6b3/
├── command
├── conf
│   ├── beeline-log4j2.properties
...
│   └── yarn-site.xml
├── env
└── hive-logs
    ├── hive.log
    └── out-hiveserver2.txt

The name of the HiveServer2 output directory ends with a random sequence such as babdc6b3.

  • command contains the command executed to start HiveServer2.
  • conf is a directory containing all configuration files that are effective at the time of starting HiveServer2.
  • env lists all environment variables that are effective at the time of starting HiveServer2.
  • hive-logs/hive.log is the log file for HiveServer2.
  • hive-logs/out-hiveserver2.txt is the output of hive/hiveserver2-service.sh.

For HiveServer2 started with --amprocess, every MR3 DAGAppMaster (which runs in a process on the same machine) creates a new directory with the same name as the application ID under the HiveServer2 output directory. Typically the DAGAppMaster output directory contains the log file for the DAGAppMaster, stderr output, and stdout output, as shown in the following example:

application_1516622736564_1439/
├── run.log
├── stderr
└── stdout