Enum SubtaskStateMapper

java.lang.Object
java.lang.Enum<SubtaskStateMapper>
org.apache.flink.runtime.io.network.api.writer.SubtaskStateMapper
All Implemented Interfaces:
Serializable, Comparable<SubtaskStateMapper>

@Internal public enum SubtaskStateMapper extends Enum<SubtaskStateMapper>
The SubtaskStateMapper narrows down the subtasks that need to be read during rescaling to recover from a particular subtask when in-flight data has been stored in the checkpoint.

Mappings of old subtasks to new subtasks may be unique or non-unique. A unique assignment means that a particular old subtask is only assigned to exactly one new subtask. Non-unique assignments require filtering downstream. That means that the receiver side has to cross-verify for a deserialized record if it truly belongs to the new subtask or not. Most SubtaskStateMapper will only produce unique assignments and are thus optimal. Some rescaler, such as RANGE, create a mixture of unique and non-unique mappings, where downstream tasks need to filter on some mapped subtasks.

  • Enum Constant Summary

    Enum Constants
    Enum Constant
    Description
    Extra state is redistributed to other subtasks without any specific guarantee (only that up- and downstream are matched).
    Restores extra subtasks to the first subtask.
    Replicates the state to all subtasks.
    Remaps old ranges to new ranges.
    Redistributes subtask state in a round robin fashion.
     
  • Method Summary

    Modifier and Type
    Method
    Description
    getNewToOldSubtasksMapping(int oldParallelism, int newParallelism)
    Returns a mapping new subtask index to all old subtask indexes.
    abstract int[]
    getOldSubtasks(int newSubtaskIndex, int oldNumberOfSubtasks, int newNumberOfSubtasks)
    Returns all old subtask indexes that need to be read to restore all buffers for the given new subtask index on rescale.
    boolean
    Returns true iff this mapper can potentially lead to ambiguous mappings where the different new subtasks map to the same old subtask.
    Returns the enum constant of this type with the specified name.
    Returns an array containing the constants of this enum type, in the order they are declared.

    Methods inherited from class java.lang.Enum

    clone, compareTo, equals, finalize, getDeclaringClass, hashCode, name, ordinal, toString, valueOf

    Methods inherited from class java.lang.Object

    getClass, notify, notifyAll, wait, wait, wait
  • Enum Constant Details

    • ARBITRARY

      public static final SubtaskStateMapper ARBITRARY
      Extra state is redistributed to other subtasks without any specific guarantee (only that up- and downstream are matched).
    • FIRST

      public static final SubtaskStateMapper FIRST
      Restores extra subtasks to the first subtask.
    • FULL

      public static final SubtaskStateMapper FULL
      Replicates the state to all subtasks. This rescaling causes a huge overhead and completely relies on filtering the data downstream.

      This strategy should only be used as a fallback.

    • RANGE

      public static final SubtaskStateMapper RANGE
      Remaps old ranges to new ranges. For minor rescaling that means that new subtasks are mostly assigned 2 old subtasks.

      Example:
      old assignment: 0 -> [0;43); 1 -> [43;87); 2 -> [87;128)
      new assignment: 0 -> [0;64]; 1 -> [64;128)
      subtask 0 recovers data from old subtask 0 + 1 and subtask 1 recovers data from old subtask 1 + 2

      For all downscale from n to [n-1 .. n/2], each new subtasks get exactly two old subtasks assigned.

      For all upscale from n to [n+1 .. 2*n-1], most subtasks get two old subtasks assigned, except the two outermost.

      Larger scale factors (<n/2, >2*n), will increase the number of old subtasks accordingly. However, they will also create more unique assignment, where an old subtask is exclusively assigned to a new subtask. Thus, the number of non-unique mappings is upper bound by 2*n.

    • ROUND_ROBIN

      public static final SubtaskStateMapper ROUND_ROBIN
      Redistributes subtask state in a round robin fashion. Returns a mapping of newIndex -> oldIndexes. The mapping is accessed by using Bitset oldIndexes = mapping.get(newIndex).

      For oldParallelism < newParallelism, that mapping is trivial. For example if oldParallelism = 6 and newParallelism = 10.

      New indexOld indexes
      00
      11
      ...
      55
      6
      ...
      9

      For oldParallelism > newParallelism, new indexes get multiple assignments by wrapping around assignments in a round-robin fashion. For example if oldParallelism = 10 and newParallelism = 4.

      New indexOld indexes
      00, 4, 8
      11, 5, 9
      22, 6
      33, 7
    • UNSUPPORTED

      public static final SubtaskStateMapper UNSUPPORTED
  • Method Details

    • values

      public static SubtaskStateMapper[] values()
      Returns an array containing the constants of this enum type, in the order they are declared.
      Returns:
      an array containing the constants of this enum type, in the order they are declared
    • valueOf

      public static SubtaskStateMapper valueOf(String name)
      Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)
      Parameters:
      name - the name of the enum constant to be returned.
      Returns:
      the enum constant with the specified name
      Throws:
      IllegalArgumentException - if this enum type has no constant with the specified name
      NullPointerException - if the argument is null
    • getOldSubtasks

      public abstract int[] getOldSubtasks(int newSubtaskIndex, int oldNumberOfSubtasks, int newNumberOfSubtasks)
      Returns all old subtask indexes that need to be read to restore all buffers for the given new subtask index on rescale.
    • getNewToOldSubtasksMapping

      public RescaleMappings getNewToOldSubtasksMapping(int oldParallelism, int newParallelism)
      Returns a mapping new subtask index to all old subtask indexes.
    • isAmbiguous

      public boolean isAmbiguous()
      Returns true iff this mapper can potentially lead to ambiguous mappings where the different new subtasks map to the same old subtask. The assumption is that such replicated data needs to be filtered.