Class AllToAllVertexInputInfoComputer

java.lang.Object
org.apache.flink.runtime.scheduler.adaptivebatch.util.AllToAllVertexInputInfoComputer

public class AllToAllVertexInputInfoComputer extends Object
Helper class that computes VertexInputInfo for all to all like inputs.
  • Constructor Details

    • AllToAllVertexInputInfoComputer

      public AllToAllVertexInputInfoComputer(double skewedFactor, long defaultSkewedThreshold)
  • Method Details

    • compute

      public Map<IntermediateDataSetID,JobVertexInputInfo> compute(JobVertexID jobVertexId, List<BlockingInputInfo> inputInfos, int parallelism, int minParallelism, int maxParallelism, long dataVolumePerTask)
      Decide parallelism and input infos, which will make the data be evenly distributed to downstream subtasks for ALL_TO_ALL, such that different downstream subtasks consume roughly the same amount of data.

      Assume there are two input infos upstream, each with three partitions and two subpartitions, their data bytes information are: input1: 0->[1,1] 1->[2,2] 2->[3,3], input2: 0->[1,1] 1->[1,1] 2->[1,1]. This method processes the data as follows:
      1. Create subpartition slices for inputs with same type number, different from pointwise computer, this method creates subpartition slices by following these steps: Firstly, reorganize the data by subpartition index: input1: {0->[1,2,3],1->[1,2,3]}, input2: {0->[1,1,1],1->[1,1,1]}. Secondly, split subpartitions with the same index into relatively balanced n parts (if possible): {0->[1,2][3],1->[1,2][3]}, {0->[1,1,1],1->[1,1,1]}. Then perform a cartesian product operation to ensure data correctness input1: {0->[1,2],0->[3],1->[1,2],1->[3]}, input2: {0->[1,1,1],0->[1,1,1],1->[1,1,1],1->[1,1,1]}, Finally, create subpartition slices base on the result of the previous step. i.e., each input has four balanced subpartition slices.
      2. Based on the above subpartition slices, calculate the subpartition slice range each task needs to subscribe to, considering data volume and parallelism constraints: [0,0],[1,1],[2,2],[3,3]
      3. Convert the calculated subpartition slice range to the form of partition index range -> subpartition index range:
      task0: input1: {[0,1]->[0]} input2:{[0,2]->[0]}
      task1: input1: {[2,2]->[0]} input2:{[0,2]->[0]}
      task2: input1: {[0,1]->[1]} input2:{[0,2]->[1]}
      task3: input1: {[2,2]->[1]} input2:{[0,2]->[1]}

      Parameters:
      jobVertexId - The job vertex id
      inputInfos - The information of consumed blocking results
      parallelism - The parallelism of the job vertex
      minParallelism - the min parallelism
      maxParallelism - the max parallelism
      dataVolumePerTask - proposed data volume per task for this set of inputInfo
      Returns:
      the parallelism and vertex input infos