Running Metastore

Hive-MR3 can run only if Metastore is running. Any version of Metastore, not necessarily those included in a Hive-MR3 release, works okay with Hive-MR3. For example, if Metastore is already running, the user may reuse it without starting another instance of Metastore. In a multi-user environment, the administrator user (e.g., hive) typically starts Metastore.

In order to run Metastore included in the Hive-MR3 release, set the following environment variables in env.sh as necessary:

HIVE1_METASTORE_HOST=$HOSTNAME
HIVE1_METASTORE_PORT=9810
HIVE1_METASTORE_LOCAL_PORT=9811
HIVE1_DATABASE_NAME=hivemr3
HIVE1_HDFS_WAREHOUSE=/tmp/hivemr3/warehouse

HIVE2_METASTORE_HOST=$HOSTNAME
HIVE2_METASTORE_PORT=9820
HIVE2_METASTORE_LOCAL_PORT=9821
HIVE2_DATABASE_NAME=hive2mr3
HIVE2_HDFS_WAREHOUSE=/tmp/hivemr3/warehouse

HIVE3_METASTORE_HOST=$HOSTNAME
HIVE3_METASTORE_PORT=9830
HIVE3_METASTORE_LOCAL_PORT=9831
HIVE3_DATABASE_NAME=hive3mr3
HIVE3_HDFS_WAREHOUSE=/tmp/hivemr3/warehouse

HIVE_METASTORE_HEAPSIZE=12288

HIVE_SECURE_MODE=false
HIVE_METASTORE_KERBEROS_PRINCIPAL=hive/_HOST@RED
HIVE_METASTORE_KERBEROS_KEYTAB=/etc/security/keytabs/hive.service.keytab

HIVE_MYSQL_DRIVER=/usr/share/java/mysql-connector-java.jar

Note that env.sh specifies a Metastore address (host and port) for each version of Hive because of the incompatibility between different versions of Metastore.

  • HIVE1_METASTORE_LOCAL_PORT specifies the port for Metastore running in local mode (in which everything runs on a single machine) with --hivesrc1. If the user does not need Hive-MR3 in local mode, this environment variable may be ignored.
  • HIVE1_DATABASE_NAME specifies the database name for Metastore running with --hivesrc1.
  • HIVE1_HDFS_WAREHOUSE specifies the directory for the Hive warehouse on HDFS for Metastore running in non-local mode with --hivesrc1. For local mode, Hive-MR3 creates a Hive warehouse under the installation directory. Note that different versions of Metastore can share the same Hive warehouse, while their databases cannot be shared.
  • Similarly for --hivesrc2 and --hivesrc3.

  • HIVE_SERVER2_HEAPSIZE specifies the heap size (in megabytes) for Metastore.
  • HIVE_SECURE_MODE specifies whether SASL in Metastore is enabled or not.
  • HIVE_METASTORE_KERBEROS_PRINCIPAL and HIVE_METASTORE_KERBEROS_KEYTAB specify the principal and keytab file for Metastore, and correspond to configuration keys hive.metastore.kerberos.principal and hive.metastore.kerberos.keytab.file in hive-site.xml.
  • HIVE_MYSQL_DRIVER specifies the path to a MySQL connector jar file which is necessary when using a MySQL database. One can download the official JDBC driver for MySQL at https://dev.mysql.com/downloads/connector/j/.

In order to start Metastore, execute hive/metastore-service.sh with the following options:

start                     # Start Metastore on port defined in HIVE?_METASTORE_PORT.
stop                      # Stop Metastore on port defined in HIVE?_METASTORE_PORT.
restart                   # Restart Metastore on port defined in HIVE?_METASTORE_PORT.
--local                   # Run jobs with configurations in conf/local/.
--cluster                 # Run jobs with configurations in conf/cluster/ (default).
--mysql                   # Run jobs with configurations in conf/mysql/.
--tpcds                   # Run jobs with configurations in conf/tpcds/.
--hivesrc1                # Choose hive1-mr3 (based on Hive 1.2.2) (default).
--hivesrc2                # Choose hive2-mr3 (based on Hive 2.3.2).
--hivesrc3                # Choose hive3-mr3 (based on Hive 2.1.1).
--init-schema             # Initialize the database schema. 
--hiveconf <key>=<value>  # Add a configuration key/value.
<Metastore option>        # Add a Metastore option.
  • The user should use --init-schema to initialize the database schema when running Metastore for the first time. Otherwise the script may fail with the following error message in the log:
    MetaException(message:Version information not found in metastore. )
    

    Initializing the database schema is also necessary for enabling ACID transactions in Hive-MR3.

  • If the database becomes corrupt, the user should delete it manually before restarting Metastore. For a Derby database, the user can just delete the corresponding database directory as follows:
    rm -rf hive/hive-local-data/metastore/hive2mr3
    rm -rf hive/hive-local-data/metastore-cluster/hive3mr3
    

    For a MySQL database, the user should connect to the MySQL server and execute a command to delete it.

  • The user can append as many Metastore options (for the command hive --service metastore from Hive) as necessary to the command.

To see the type of the database used by Metastore, find the configuration key javax.jdo.option.ConnectionDriverName in hive-site.xml. For example, with --tpcds, Metastore uses a MySQL database:

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

If the configuration key javax.jdo.option.ConnectionDriverName is missing in hive-site.xml, Metastore uses a Derby database by default, as is the case when starting Metastore with either --local or --cluster. With --mysql and --tpcds, it uses a MySQL database.

In order to use a MySQL database, the user (who starts Metastore) should have access to the database with a user name and a password, which should be explicitly set in hive-site.xml:

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hivemr3</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>password</value>
</property>

Here are examples of starting Metastore for the first time:

hive/metastore-service.sh --local --hivesrc1
hive/metastore-service.sh --mysql --hivesrc2 --init-schema
hive/metastore-service.sh --tpcds --hivesrc3 --init-schema

By default, the log file for starting Metastore is written to /tmp/<user name>/hive.log. Below is an example of messages printed to the log file when Metastore starts successfully:

2018-03-12T14:52:24,611  INFO [main] metastore.HiveMetaStore: Started the new metaserver on port [9830]...
2018-03-12T14:52:24,611  INFO [main] metastore.HiveMetaStore: Options.minWorkerThreads = 200
2018-03-12T14:52:24,611  INFO [main] metastore.HiveMetaStore: Options.maxWorkerThreads = 1000
2018-03-12T14:52:24,611  INFO [main] metastore.HiveMetaStore: TCP keepalive = true

Note that all instances of Metastore started by the same user share the same log file.