Class ExecutionFailureHandler

java.lang.Object
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler

public class ExecutionFailureHandler extends Object
This handler deals with task failures to return a FailureHandlingResult which contains tasks to restart to recover from failures.
  • Field Details

  • Constructor Details

    • ExecutionFailureHandler

      public ExecutionFailureHandler(org.apache.flink.configuration.Configuration jobMasterConfig, SchedulingTopology schedulingTopology, FailoverStrategy failoverStrategy, RestartBackoffTimeStrategy restartBackoffTimeStrategy, org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor mainThreadExecutor, Collection<org.apache.flink.core.failure.FailureEnricher> failureEnrichers, org.apache.flink.core.failure.FailureEnricher.Context taskFailureCtx, org.apache.flink.core.failure.FailureEnricher.Context globalFailureCtx, org.apache.flink.metrics.MetricGroup metricGroup)
      Creates the handler to deal with task failures.
      Parameters:
      schedulingTopology - contains the topology info for failover
      failoverStrategy - helps to decide tasks to restart on task failures
      restartBackoffTimeStrategy - helps to decide whether to restart failed tasks and the restarting delay
      mainThreadExecutor - the main thread executor of the job master
      failureEnrichers - a collection of FailureEnricher that enrich failures
      taskFailureCtx - Task failure Context used by FailureEnrichers
      globalFailureCtx - Global failure Context used by FailureEnrichers
  • Method Details

    • getFailureHandlingResult

      public FailureHandlingResult getFailureHandlingResult(Execution failedExecution, Throwable cause, long timestamp)
      Return result of failure handling. Can be a set of task vertices to restart and a delay of the restarting. Or that the failure is not recoverable and the reason for it.
      Parameters:
      failedExecution - is the failed execution
      cause - of the task failure
      timestamp - of the task failure
      Returns:
      result of the failure handling
    • getGlobalFailureHandlingResult

      public FailureHandlingResult getGlobalFailureHandlingResult(Throwable cause, long timestamp)
      Return result of failure handling on a global failure. Can be a set of task vertices to restart and a delay of the restarting. Or that the failure is not recoverable and the reason for it.
      Parameters:
      cause - of the task failure
      timestamp - of the task failure
      Returns:
      result of the failure handling
    • isUnrecoverableError

      public static boolean isUnrecoverableError(Throwable cause)
    • getNumberOfRestarts

      public long getNumberOfRestarts()