Configuring Tez Runtime

The behavior of Tez runtime is specified by the configuration file tez-site.xml in the classpath. MR3 inherits all configuration keys for Tez runtime from original Tez. For example, tez.runtime.io.sort.mb specifies the amount of memory required for sorting the output. In addition, MR3 introduces a few configuration keys which are specific to new features in MR3. Below we describe these configuration keys.

Name Default value Description
tez.runtime.pipelined.sorter.use.soft.reference false true: use soft references for ByteBuffers allocated in PipelinedSorter. These soft references are reused across TaskAttempts running in the same ContainerWorker.
false: do not use soft references.
tez.shuffle-vertex-manager.enable.auto-parallel false true: enable auto parallelism for ShuffleVertexManager.
false: disable auto parallelism.
tez.shuffle-vertex-manager.auto-parallel.min.num.tasks 20 Minimum number of Tasks to trigger auto parallelism. For example, if the value is set to 20, only those Vertexes with at least 20 Tasks are considered for auto parallelism.
tez.shuffle-vertex-manager.auto-parallel.max.reduction.percentage 10 Specifies the percentage of Tasks that can be kept after applying auto parallelism. For example, if the value is set to 10, the number of Tasks can be reduced by up to 100 - 10 = 90 percent, thereby leaving 10 percent of Tasks.
tez.shuffle-vertex-manager.use-stats-auto-parallelism false true: analyze input statistics when applying auto parallelism.
false: do not use input statistics.
tez.shuffle.vertex.manager.auto.parallelism.min.percent 20 Specifies the lower limit when normalizing input statistics. For example, if the value is set to 20, input statistics are normalized between 20 and 100. That is, an input size of zero is normalized to 20 while the maximum input size is mapped to 100.