Running HiveCLI

For running HiveCLI, the user should use HiveCLI included in the MR3 release.

In order to run HiveCLI, set the following environment variables in env.sh as necessary:

USER_PRINCIPAL=gitlab-runner@RED
USER_KEYTAB=/home/gitlab-runner/gitlab-runner.keytab

HIVE_CLIENT_HEAPSIZE=2048
  • USER_PRINCIPAL and USER_KEYTAB specify the principal and keytab file for the user executing HiveCLI in a secure cluster with Kerberos.
  • HIVE_CLIENT_HEAPSIZE specifies the heap size (in megabytes) for HiveCLI.

In order to start a HiveCLI session, execute hive/run-hive-cli.sh with the following options:

--local                   # Run jobs with configurations in conf/local/.
--cluster                 # Run jobs with configurations in conf/cluster/ (default).
--mysql                   # Run jobs with configurations in conf/mysql/.
--tpcds                   # Run jobs with configurations in conf/tpcds/.
--hivesrc1                # Choose hive1-mr3 (based on Hive 1.2.2) (default).
--hivesrc2                # Choose hive2-mr3 (based on Hive 2.3.3).
--hivesrc5                # Choose hive5-mr3 (based on Hive 3.0.0).
--tezsrc1                 # Choose tez1-mr3 (based on Tez 0.7.0) (default).
--tezsrc3                 # Choose tez3-mr3 (based on Tez 0.9.1). 
--amprocess               # Run the MR3 DAGAppMaster in LocalProcess mode.
--hiveconf <key>=<value>  # Add a configuration key/value; may be repeated at the end.
<HiveCLI option>          # Add a HiveCLI option; may be repeated at the end.

With --amprocess, HiveCLI runs a new MR3 DAGAppMaster in LocalProcess mode. Hence a new process starts on the same machine where the script is run. The user can append as many HiveCLI options (for the command hive from Hive) as necessary to the command. (Currently --amprocess cannot be used in a secure cluster with Kerberos; see the documentation on LocalProcess mode.)

In a secure cluster with Kerberos, HiveCLI uses the Kerberos ticket provided by the user to authenticate itself to Yarn. Hence the Kerberos ticket should be valid at the time of executing the script. In a non-secure cluster without Kerberos, HiveCLI communicates with Yarn as the user executing the script.

Executing hive/run-hive-cli.sh creates a new directory under hive/run-hive-cli-result:

hive-mr3--2018-03-14--00-13-40-8f4aaf03/
├── command
├── conf
│   ├── beeline-log4j.properties
...
│   └── yarn-site.xml
├── env
├── hive-logs
│   └── hive.log
└── out.txt

The name of the HiveCLI output directory ends with a random sequence such as 8f4aaf03.

  • command contains the command executed to start the HiveCLI session.
  • conf is a directory containing all configuration files that are effective at the time of starting the HiveCLI session.
  • env lists all environment variables that are effective at the time of starting the HiveCLI session.
  • hive-logs/hive.log is the log file for HiveCLI.
  • out.txt is the output of hive/run-hive-cli.sh.