Running Metastore

Hive can run only if Metastore is running. Hive on MR3 can run with any instance of MetaStore of the same version, not necessarily one included in the MR3 release. For example, if Metastore is already running in a Hadoop cluster, the user may reuse it without starting another instance of Metastore. We, however, recommend MetaStore included in the MR3 release because it introduces a few improvements (e.g. https://github.com/apache/hive/pull/454). In a multi-user environment, the administrator user (e.g., hive) typically starts Metastore.

In order to run Metastore included in the MR3 release, set the following environment variables in env.sh as necessary:

HIVE1_DATABASE_HOST=$HOSTNAME
HIVE1_METASTORE_HOST=$HOSTNAME
HIVE1_METASTORE_PORT=9810
HIVE1_METASTORE_LOCAL_PORT=9811
HIVE1_DATABASE_NAME=hivemr3
HIVE1_HDFS_WAREHOUSE=/tmp/hivemr3/warehouse

HIVE2_DATABASE_HOST=$HOSTNAME
HIVE2_METASTORE_HOST=$HOSTNAME
HIVE2_METASTORE_PORT=9820
HIVE2_METASTORE_LOCAL_PORT=9821
HIVE2_DATABASE_NAME=hive2mr3
HIVE2_HDFS_WAREHOUSE=/tmp/hivemr3/warehouse

HIVE3_DATABASE_HOST=$HOSTNAME
HIVE5_METASTORE_HOST=$HOSTNAME
HIVE5_METASTORE_PORT=9850
HIVE5_METASTORE_LOCAL_PORT=9851
HIVE5_DATABASE_NAME=hive5mr3
HIVE5_HDFS_WAREHOUSE=/tmp/hivemr3/warehouse

HIVE_METASTORE_HEAPSIZE=12288

HIVE_METASTORE_KERBEROS_PRINCIPAL=hive/_HOST@RED
HIVE_METASTORE_KERBEROS_KEYTAB=/etc/security/keytabs/hive.service.keytab

HIVE_MYSQL_DRIVER=/usr/share/java/mysql-connector-java.jar

Note that env.sh specifies a Metastore address (host and port) for each version of Hive because of the incompatibility between different versions of Metastore.

  • HIVE1_DATABASE_HOST specifies the host where the database for Metastore is running, whereas HIVE1_METASTORE_HOST specifies the host where Metastore itself is running.
  • HIVE1_METASTORE_LOCAL_PORT specifies the port for Metastore running in local mode (in which everything runs on a single machine) with --hivesrc1. If the user does not need Hive on MR3 in local mode, this environment variable may be ignored.
  • HIVE1_DATABASE_NAME specifies the database name for Metastore running with --hivesrc1.
  • HIVE1_HDFS_WAREHOUSE specifies the directory for the Hive warehouse on HDFS for Metastore running in non-local mode with --hivesrc1. For local mode, Hive on MR3 creates a Hive warehouse under the installation directory. Note that different versions of Metastore can share the same Hive warehouse, while their databases cannot be shared.
  • Similarly for --hivesrc2 and --hivesrc5.

  • HIVE_METASTORE_HEAPSIZE specifies the heap size (in megabytes) for Metastore.
  • HIVE_METASTORE_KERBEROS_PRINCIPAL and HIVE_METASTORE_KERBEROS_KEYTAB specify the principal and keytab file for Metastore, and correspond to configuration keys hive.metastore.kerberos.principal and hive.metastore.kerberos.keytab.file in hive-site.xml.
  • HIVE_MYSQL_DRIVER specifies the path to a MySQL connector jar file which is necessary when using a MySQL database. One can download the official JDBC driver for MySQL at https://dev.mysql.com/downloads/connector/j/.

In order to start Metastore, execute hive/metastore-service.sh with the following options:

start                     # Start Metastore on port defined in HIVE?_METASTORE_PORT.
stop                      # Stop Metastore on port defined in HIVE?_METASTORE_PORT.
restart                   # Restart Metastore on port defined in HIVE?_METASTORE_PORT.
--local                   # Run jobs with configurations in conf/local/ (default).
--cluster                 # Run jobs with configurations in conf/cluster/.
--mysql                   # Run jobs with configurations in conf/mysql/.
--tpcds                   # Run jobs with configurations in conf/tpcds/.
--hivesrc1                # Choose hive1-mr3 (based on Hive 1.2.2) (default).
--hivesrc2                # Choose hive2-mr3 (based on Hive 2.3.3).
--hivesrc5                # Choose hive5-mr3 (based on Hive 3.1.0).
--init-schema             # Initialize the database schema. 
--hiveconf <key>=<value>  # Add a configuration key/value.
<Metastore option>        # Add a Metastore option.
  • The user should use --init-schema to initialize the database schema when running Metastore for the first time. Otherwise the script may fail with the following error message in the log:
    MetaException(message:Version information not found in metastore. )
    

    Initializing the database schema is also necessary for enabling ACID transactions in Hive.

  • If the database becomes corrupt, the user should delete it manually before restarting Metastore. For a Derby database, the user can just delete the corresponding database directory as follows:
    rm -rf hive/hive-local-data/metastore/hive2mr3
    rm -rf hive/hive-local-data/metastore-cluster/hive2mr3
    

    For a MySQL database, the user should connect to the MySQL server and execute a command to delete it.

  • The user can append as many Metastore options (for the command hive --service metastore from Hive) as necessary to the command.

To see the type of the database used by Metastore, find the configuration key javax.jdo.option.ConnectionDriverName in hive-site.xml. For example, with --tpcds, Metastore uses a MySQL database:

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

If the configuration key javax.jdo.option.ConnectionDriverName is missing in hive-site.xml, Metastore uses a Derby database by default, as is the case when starting Metastore with either --local or --cluster. With --mysql and --tpcds, it uses a MySQL database.

In order to use a MySQL database, the user (who starts Metastore) should have access to the database with a user name and a password, which should be explicitly set in hive-site.xml:

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hivemr3</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>password</value>
</property>

Here are examples of starting Metastore for the first time:

hive/metastore-service.sh start --local --hivesrc1
hive/metastore-service.sh start --mysql --hivesrc2 --init-schema

By default, the log file for starting Metastore is written to /tmp/<user name>/hive.log. Below is an example of messages printed to the log file when Metastore starts successfully:

2018-03-12T14:52:24,611  INFO [main] metastore.HiveMetaStore: Started the new metaserver on port [9830]...
2018-03-12T14:52:24,611  INFO [main] metastore.HiveMetaStore: Options.minWorkerThreads = 200
2018-03-12T14:52:24,611  INFO [main] metastore.HiveMetaStore: Options.maxWorkerThreads = 1000
2018-03-12T14:52:24,611  INFO [main] metastore.HiveMetaStore: TCP keepalive = true

Note that all instances of Metastore started by the same user share the same log file.