Configuring Tez Runtime
The behavior of Tez runtime is specified by the configuration file
tez-site.xml in the classpath.
MR3 inherits all configuration keys for Tez runtime from original Tez.
tez.runtime.io.sort.mb specifies the amount of memory required for sorting the output.
In addition, MR3 introduces a few configuration keys which are specific to new features in MR3.
Below we describe these configuration keys.
|tez.runtime.pipelined.sorter.use.soft.reference||false||true: use soft references for ByteBuffers allocated in PipelinedSorter. These soft references are reused across TaskAttempts running in the same ContainerWorker.
false: do not use soft references.
|tez.shuffle-vertex-manager.enable.auto-parallel||false||true: enable auto parallelism for ShuffleVertexManager.
false: disable auto parallelism.
|tez.shuffle-vertex-manager.auto-parallel.min.num.tasks||20||Minimum number of Tasks to trigger auto parallelism. For example, if the value is set to 20, only those Vertexes with at least 20 Tasks are considered for auto parallelism.|
|tez.shuffle-vertex-manager.auto-parallel.max.reduction.percentage||10||Specifies the percentage of Tasks that can be kept after applying auto parallelism. For example, if the value is set to 10, the number of Tasks can be reduced by up to 100 - 10 = 90 percent, thereby leaving 10 percent of Tasks.|
|tez.shuffle-vertex-manager.use-stats-auto-parallelism||false||true: analyze input statistics when applying auto parallelism.
false: do not use input statistics.
|tez.shuffle.vertex.manager.auto.parallelism.min.percent||20||Specifies the lower limit when normalizing input statistics. For example, if the value is set to 20, input statistics are normalized between 20 and 100. That is, an input size of zero is normalized to 20 while the maximum input size is mapped to 100.|