Running Metastore
Hive can run only if Metastore is running.
Hive on MR3 can run with any instance of MetaStore of the same version, not necessarily one included in the MR3 release.
For example, if Metastore is already running in a Hadoop cluster, the user may reuse it without starting another instance of Metastore.
We, however, recommend MetaStore included in the MR3 release because it introduces a few improvements (e.g. https://github.com/apache/hive/pull/454).
In a multi-user environment, the administrator user (e.g., hive
) typically starts Metastore.
In order to run Metastore included in the MR3 release, set the following environment variables in env.sh
as necessary:
HIVE1_DATABASE_HOST=$HOSTNAME
HIVE1_METASTORE_HOST=$HOSTNAME
HIVE1_METASTORE_PORT=9810
HIVE1_METASTORE_LOCAL_PORT=9811
HIVE1_DATABASE_NAME=hivemr3
HIVE1_HDFS_WAREHOUSE=/tmp/hivemr3/warehouse
HIVE2_DATABASE_HOST=$HOSTNAME
HIVE2_METASTORE_HOST=$HOSTNAME
HIVE2_METASTORE_PORT=9820
HIVE2_METASTORE_LOCAL_PORT=9821
HIVE2_DATABASE_NAME=hive2mr3
HIVE2_HDFS_WAREHOUSE=/tmp/hivemr3/warehouse
HIVE3_DATABASE_HOST=$HOSTNAME
HIVE5_METASTORE_HOST=$HOSTNAME
HIVE5_METASTORE_PORT=9850
HIVE5_METASTORE_LOCAL_PORT=9851
HIVE5_DATABASE_NAME=hive5mr3
HIVE5_HDFS_WAREHOUSE=/tmp/hivemr3/warehouse
HIVE_METASTORE_HEAPSIZE=12288
HIVE_METASTORE_KERBEROS_PRINCIPAL=hive/_HOST@RED
HIVE_METASTORE_KERBEROS_KEYTAB=/etc/security/keytabs/hive.service.keytab
HIVE_MYSQL_DRIVER=/usr/share/java/mysql-connector-java.jar
Note that env.sh
specifies a Metastore address (host and port) for each version of Hive because of the incompatibility between different versions of Metastore.
HIVE1_DATABASE_HOST
specifies the host where the database for Metastore is running, whereasHIVE1_METASTORE_HOST
specifies the host where Metastore itself is running.HIVE1_METASTORE_LOCAL_PORT
specifies the port for Metastore running in local mode (in which everything runs on a single machine) with--hivesrc1
. If the user does not need Hive on MR3 in local mode, this environment variable may be ignored.HIVE1_DATABASE_NAME
specifies the database name for Metastore running with--hivesrc1
.HIVE1_HDFS_WAREHOUSE
specifies the directory for the Hive warehouse on HDFS for Metastore running in non-local mode with--hivesrc1
. For local mode, Hive on MR3 creates a Hive warehouse under the installation directory. Note that different versions of Metastore can share the same Hive warehouse, while their databases cannot be shared.-
Similarly for
--hivesrc2
and--hivesrc5
. HIVE_METASTORE_HEAPSIZE
specifies the heap size (in megabytes) for Metastore.HIVE_METASTORE_KERBEROS_PRINCIPAL
andHIVE_METASTORE_KERBEROS_KEYTAB
specify the principal and keytab file for Metastore, and correspond to configuration keyshive.metastore.kerberos.principal
andhive.metastore.kerberos.keytab.file
inhive-site.xml
.HIVE_MYSQL_DRIVER
specifies the path to a MySQL connector jar file which is necessary when using a MySQL database. One can download the official JDBC driver for MySQL at https://dev.mysql.com/downloads/connector/j/.
In order to start Metastore, execute hive/metastore-service.sh
with the following options:
start # Start Metastore on port defined in HIVE?_METASTORE_PORT.
stop # Stop Metastore on port defined in HIVE?_METASTORE_PORT.
restart # Restart Metastore on port defined in HIVE?_METASTORE_PORT.
--local # Run jobs with configurations in conf/local/ (default).
--cluster # Run jobs with configurations in conf/cluster/.
--mysql # Run jobs with configurations in conf/mysql/.
--tpcds # Run jobs with configurations in conf/tpcds/.
--hivesrc1 # Choose hive1-mr3 (based on Hive 1.2.2).
--hivesrc2 # Choose hive2-mr3 (based on Hive 2.3.4).
--hivesrc5 # Choose hive5-mr3 (based on Hive 3.1.1) (default).
--init-schema # Initialize the database schema.
--hiveconf <key>=<value> # Add a configuration key/value.
<Metastore option> # Add a Metastore option.
- The user should use
--init-schema
to initialize the database schema when running Metastore for the first time. Otherwise the script may fail with the following error message in the log:MetaException(message:Version information not found in metastore. )
Initializing the database schema is also necessary for enabling ACID transactions in Hive.
- If the database becomes corrupt, the user should delete it manually before restarting Metastore.
For a Derby database, the user can just delete the corresponding database directory as follows:
rm -rf hive/hive-local-data/metastore/hive2mr3 rm -rf hive/hive-local-data/metastore-cluster/hive2mr3
For a MySQL database, the user should connect to the MySQL server and execute a command to delete it.
- The user can append as many Metastore options (for the command
hive --service metastore
from Hive) as necessary to the command.
To see the type of the database used by Metastore, find the configuration key javax.jdo.option.ConnectionDriverName
in hive-site.xml
.
For example, with --tpcds
, Metastore uses a MySQL database:
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
If the configuration key javax.jdo.option.ConnectionDriverName
is missing in hive-site.xml
, Metastore uses a Derby database by default,
as is the case when starting Metastore with either --local
or --cluster
.
With --mysql
and --tpcds
, it uses a MySQL database.
In order to use a MySQL database, the user (who starts Metastore) should have access to the database with a user name and a password,
which should be explicitly set in hive-site.xml
:
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hivemr3</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
Here are examples of starting Metastore for the first time:
hive/metastore-service.sh start --local --hivesrc1
hive/metastore-service.sh start --mysql --hivesrc2 --init-schema
By default, the log file for starting Metastore is written to /tmp/<user name>/hive.log
.
Below is an example of messages printed to the log file when Metastore starts successfully:
2018-03-12T14:52:24,611 INFO [main] metastore.HiveMetaStore: Started the new metaserver on port [9830]...
2018-03-12T14:52:24,611 INFO [main] metastore.HiveMetaStore: Options.minWorkerThreads = 200
2018-03-12T14:52:24,611 INFO [main] metastore.HiveMetaStore: Options.maxWorkerThreads = 1000
2018-03-12T14:52:24,611 INFO [main] metastore.HiveMetaStore: TCP keepalive = true
Note that all instances of Metastore started by the same user share the same log file.
- Previous
- Next