Class CheckpointFailureManager
java.lang.Object
org.apache.flink.runtime.checkpoint.CheckpointFailureManager
The checkpoint failure manager which centralized manage checkpoint failure processing logic.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceA callback interface about how to fail a job. -
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionCheckpointFailureManager(int tolerableCpFailureNumber, CheckpointFailureManager.FailJobCallback failureCallback) -
Method Summary
Modifier and TypeMethodDescriptionvoidcheckFailureCounter(CheckpointException exception, long checkpointId) voidhandleCheckpointException(PendingCheckpoint pendingCheckpoint, CheckpointProperties checkpointProperties, CheckpointException exception, ExecutionAttemptID executionAttemptID, org.apache.flink.api.common.JobID job, PendingCheckpointStats pendingCheckpointStats, CheckpointStatsTracker statsTracker) Failures on JM: all checkpoints - go against failure counter.voidhandleCheckpointSuccess(long checkpointId) Handle checkpoint success.
-
Field Details
-
UNLIMITED_TOLERABLE_FAILURE_NUMBER
public static final int UNLIMITED_TOLERABLE_FAILURE_NUMBER- See Also:
-
EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE
- See Also:
-
-
Constructor Details
-
CheckpointFailureManager
public CheckpointFailureManager(int tolerableCpFailureNumber, CheckpointFailureManager.FailJobCallback failureCallback)
-
-
Method Details
-
handleCheckpointException
public void handleCheckpointException(@Nullable PendingCheckpoint pendingCheckpoint, CheckpointProperties checkpointProperties, CheckpointException exception, @Nullable ExecutionAttemptID executionAttemptID, org.apache.flink.api.common.JobID job, @Nullable PendingCheckpointStats pendingCheckpointStats, CheckpointStatsTracker statsTracker) Failures on JM:- all checkpoints - go against failure counter.
- any savepoints - don’t do anything, manual action, the failover will not help anyway.
Failures on TM:
- all checkpoints - go against failure counter (failover might help and we want to notify users).
- sync savepoints - we must always fail, otherwise we risk deadlock when the job cancelation waiting for finishing savepoint which never happens.
- non sync savepoints - go against failure counter (failover might help solve the problem).
- Parameters:
pendingCheckpoint- the failed checkpoint if it was initialized already.checkpointProperties- the checkpoint properties in order to determinate which handle strategy can be used.exception- the checkpoint exception.executionAttemptID- the execution attempt id, as a safe guard.job- the JobID.pendingCheckpointStats- the pending checkpoint statistics.statsTracker- the tracker for checkpoint statistics.
-
checkFailureCounter
-
handleCheckpointSuccess
public void handleCheckpointSuccess(long checkpointId) Handle checkpoint success.- Parameters:
checkpointId- the failed checkpoint id used to count the continuous failure number based on checkpoint id sequence.
-