Configuring Tez Runtime

The behavior of Tez runtime is specified by the configuration file tez-site.xml in the classpath. MR3 inherits all configuration keys for Tez runtime from original Tez. For example, specifies the amount of memory required for sorting the output. In addition, MR3 introduces a few configuration keys which are specific to new features in MR3. Below we describe these configuration keys.

Name Default value Description
tez.runtime.pipelined.sorter.use.soft.reference false true: use soft references for ByteBuffers allocated in PipelinedSorter. These soft references are reused across TaskAttempts running in the same ContainerWorker.
false: do not use soft references. false true: enable auto parallelism for ShuffleVertexManager.
false: disable auto parallelism. 20 Minimum number of Tasks to trigger auto parallelism. For example, if the value is set to 20, only those Vertexes with at least 20 Tasks are considered for auto parallelism. 10 Specifies the percentage of Tasks that can be kept after applying auto parallelism. For example, if the value is set to 10, the number of Tasks can be reduced by up to 100 - 10 = 90 percent, thereby leaving 10 percent of Tasks.
tez.shuffle-vertex-manager.use-stats-auto-parallelism false true: analyze input statistics when applying auto parallelism.
false: do not use input statistics. 20 Specifies the lower limit when normalizing input statistics. For example, if the value is set to 20, input statistics are normalized between 20 and 100. That is, an input size of zero is normalized to 20 while the maximum input size is mapped to 100. mapreduce_shuffle Service ID for the external shuffle service. Set to tez_shuffle in order to use the shuffle handler of MR3.
tez.shuffle.port 15551 Default port number for the shuffle handler of MR3