Class PointwiseVertexInputInfoComputer

java.lang.Object
org.apache.flink.runtime.scheduler.adaptivebatch.util.PointwiseVertexInputInfoComputer

public class PointwiseVertexInputInfoComputer extends Object
Helper class that computes VertexInputInfo for pointwise input.
  • Constructor Details

    • PointwiseVertexInputInfoComputer

      public PointwiseVertexInputInfoComputer()
  • Method Details

    • compute

      public Map<IntermediateDataSetID,JobVertexInputInfo> compute(List<BlockingInputInfo> inputInfos, int parallelism, int minParallelism, int maxParallelism, long dataVolumePerTask)
      Decide parallelism and input infos, which will make the data be evenly distributed to downstream subtasks for POINTWISE, such that different downstream subtasks consume roughly the same amount of data.

      Assume that `inputInfo` has two partitions, each partition has three subpartitions, their data bytes are: {0->[1,2,1], 1->[2,1,2]}, and the expected parallelism is 3. The calculation process is as follows:
      1. Create subpartition slices for input which is composed of several subpartitions. The created slice list and its data bytes are: [1,2,1,2,1,2]
      2. Distribute the subpartition slices array into n balanced parts (described by `IndexRange`, named SubpartitionSliceRanges) based on data volume: [0,1],[2,3],[4,5]
      3. Reorganize the distributed results into a mapping of partition range to subpartition range: {0 -> [0,1]}, {0->[2,2],1->[0,0]}, {1->[1,2]}.
      The final result is the `SubpartitionGroup` that each of the three parallel tasks need to subscribe.

      Parameters:
      inputInfos - The information of consumed blocking results
      parallelism - The parallelism of the job vertex
      minParallelism - the min parallelism
      maxParallelism - the max parallelism
      dataVolumePerTask - proposed data volume per task for this set of inputInfo
      Returns:
      the parallelism and vertex input infos