Class FailureHandlingResult

java.lang.Object
org.apache.flink.runtime.executiongraph.failover.FailureHandlingResult

public class FailureHandlingResult extends Object
Result containing the tasks to restart upon a task failure. Also contains the reason of the failure and the vertices to restart if the failure is recoverable (in contrast to non-recoverable failure type or restarting suppressed by restart strategy).
  • Method Details

    • getVerticesToRestart

      public Set<ExecutionVertexID> getVerticesToRestart()
      Returns the tasks to restart.
      Returns:
      the tasks to restart
    • getRestartDelayMS

      public long getRestartDelayMS()
      Returns the delay before the restarting.
      Returns:
      the delay before the restarting
    • getFailedExecution

      public Optional<Execution> getFailedExecution()
      Returns an Optional with the Execution causing this failure or an empty Optional if it's a global failure.
      Returns:
      The Optional with the failed Execution or an empty Optional if it's a global failure.
    • getError

      @Nullable public Throwable getError()
      Returns reason why the restarting cannot be conducted.
      Returns:
      reason why the restarting cannot be conducted
    • getFailureLabels

      public CompletableFuture<Map<String,String>> getFailureLabels()
      Returns the labels future associated with the failure.
      Returns:
      the CompletableFuture Map of String labels
    • getTimestamp

      public long getTimestamp()
      Returns the time of the failure.
      Returns:
      The timestamp.
    • canRestart

      public boolean canRestart()
      Returns whether the restarting can be conducted.
      Returns:
      whether the restarting can be conducted
    • isGlobalFailure

      public boolean isGlobalFailure()
      Checks if this failure was a global failure, i.e., coming from a "safety net" failover that involved all tasks and should reset also components like the coordinators.
    • isRootCause

      public boolean isRootCause()
      Returns:
      True means that the current failure is a new attempt, false means that there has been a failure before and has not been tried yet, and the current failure will be merged into the previous attempt, and these merged exceptions will be considered as the concurrent exceptions.
    • restartable

      public static FailureHandlingResult restartable(@Nullable Execution failedExecution, @Nullable Throwable cause, long timestamp, CompletableFuture<Map<String,String>> failureLabels, Set<ExecutionVertexID> verticesToRestart, long restartDelayMS, boolean globalFailure, boolean isRootCause)
      Creates a result of a set of tasks to restart to recover from the failure.

      The result can be flagged to be from a global failure triggered by the scheduler, rather than from the failure of an individual task.

      Parameters:
      failedExecution - the Execution that the failure is originating from. Passing null as a value indicates that the failure was issued by Flink itself.
      cause - The reason of the failure.
      timestamp - The time of the failure.
      failureLabels - Map of labels characterizing the failure produced by the FailureEnrichers.
      verticesToRestart - containing task vertices to restart to recover from the failure. null indicates that the failure is not restartable.
      restartDelayMS - indicate a delay before conducting the restart
      Returns:
      result of a set of tasks to restart to recover from the failure
    • unrecoverable

      public static FailureHandlingResult unrecoverable(@Nullable Execution failedExecution, @Nonnull Throwable error, long timestamp, CompletableFuture<Map<String,String>> failureLabels, boolean globalFailure, boolean isRootCause)
      Creates a result that the failure is not recoverable and no restarting should be conducted.

      The result can be flagged to be from a global failure triggered by the scheduler, rather than from the failure of an individual task.

      Parameters:
      failedExecution - the Execution that the failure is originating from. Passing null as a value indicates that the failure was issued by Flink itself.
      error - reason why the failure is not recoverable
      timestamp - The time of the failure.
      failureLabels - Map of labels characterizing the failure produced by the FailureEnrichers.
      Returns:
      result indicating the failure is not recoverable