Class AbstractHaServices

java.lang.Object
org.apache.flink.runtime.highavailability.AbstractHaServices
All Implemented Interfaces:
AutoCloseable, GloballyCleanableResource, ClientHighAvailabilityServices, HighAvailabilityServices
Direct Known Subclasses:
ZooKeeperLeaderElectionHaServices

public abstract class AbstractHaServices extends Object implements HighAvailabilityServices
Abstract high availability services based on distributed system(e.g. Zookeeper, Kubernetes). It will help with creating all the leader election/retrieval services and the cleanup. Please return a proper leader name int the implementation of getLeaderPathForResourceManager(), getLeaderPathForDispatcher(), getLeaderPathForJobManager(org.apache.flink.api.common.JobID), getLeaderPathForRestServer(). The returned leader name is the ConfigMap name in Kubernetes and child path in Zookeeper.

close() and cleanupAllData() should be implemented to destroy the resources.

The abstract class is also responsible for determining which component service should be reused. For example, jobResultStore is created once and could be reused many times.

  • Field Details

    • logger

      protected final org.slf4j.Logger logger
    • ioExecutor

      protected final Executor ioExecutor
      The executor to run external IO operations on.
    • configuration

      protected final org.apache.flink.configuration.Configuration configuration
      The runtime configuration.
  • Constructor Details

  • Method Details

    • getResourceManagerLeaderRetriever

      public LeaderRetrievalService getResourceManagerLeaderRetriever()
      Description copied from interface: HighAvailabilityServices
      Gets the leader retriever for the cluster's resource manager.
      Specified by:
      getResourceManagerLeaderRetriever in interface HighAvailabilityServices
    • getDispatcherLeaderRetriever

      public LeaderRetrievalService getDispatcherLeaderRetriever()
      Description copied from interface: HighAvailabilityServices
      Gets the leader retriever for the dispatcher. This leader retrieval service is not always accessible.
      Specified by:
      getDispatcherLeaderRetriever in interface HighAvailabilityServices
    • getJobManagerLeaderRetriever

      public LeaderRetrievalService getJobManagerLeaderRetriever(org.apache.flink.api.common.JobID jobID)
      Description copied from interface: HighAvailabilityServices
      Gets the leader retriever for the job JobMaster which is responsible for the given job.
      Specified by:
      getJobManagerLeaderRetriever in interface HighAvailabilityServices
      Parameters:
      jobID - The identifier of the job.
      Returns:
      Leader retrieval service to retrieve the job manager for the given job
    • getJobManagerLeaderRetriever

      public LeaderRetrievalService getJobManagerLeaderRetriever(org.apache.flink.api.common.JobID jobID, String defaultJobManagerAddress)
      Description copied from interface: HighAvailabilityServices
      Gets the leader retriever for the job JobMaster which is responsible for the given job.
      Specified by:
      getJobManagerLeaderRetriever in interface HighAvailabilityServices
      Parameters:
      jobID - The identifier of the job.
      defaultJobManagerAddress - JobManager address which will be returned by a static leader retrieval service.
      Returns:
      Leader retrieval service to retrieve the job manager for the given job
    • getClusterRestEndpointLeaderRetriever

      public LeaderRetrievalService getClusterRestEndpointLeaderRetriever()
      Description copied from interface: ClientHighAvailabilityServices
      Get the leader retriever for the cluster's rest endpoint.
      Specified by:
      getClusterRestEndpointLeaderRetriever in interface ClientHighAvailabilityServices
      Specified by:
      getClusterRestEndpointLeaderRetriever in interface HighAvailabilityServices
      Returns:
      the leader retriever for cluster's rest endpoint.
    • getResourceManagerLeaderElection

      public LeaderElection getResourceManagerLeaderElection()
      Description copied from interface: HighAvailabilityServices
      Gets the LeaderElection for the cluster's resource manager.
      Specified by:
      getResourceManagerLeaderElection in interface HighAvailabilityServices
    • getDispatcherLeaderElection

      public LeaderElection getDispatcherLeaderElection()
      Description copied from interface: HighAvailabilityServices
      Gets the LeaderElection for the cluster's dispatcher.
      Specified by:
      getDispatcherLeaderElection in interface HighAvailabilityServices
    • getJobManagerLeaderElection

      public LeaderElection getJobManagerLeaderElection(org.apache.flink.api.common.JobID jobID)
      Description copied from interface: HighAvailabilityServices
      Gets the LeaderElection for the job with the given JobID.
      Specified by:
      getJobManagerLeaderElection in interface HighAvailabilityServices
    • getClusterRestEndpointLeaderElection

      public LeaderElection getClusterRestEndpointLeaderElection()
      Description copied from interface: HighAvailabilityServices
      Gets the LeaderElection for the cluster's rest endpoint.
      Specified by:
      getClusterRestEndpointLeaderElection in interface HighAvailabilityServices
    • getCheckpointRecoveryFactory

      public CheckpointRecoveryFactory getCheckpointRecoveryFactory() throws Exception
      Description copied from interface: HighAvailabilityServices
      Gets the checkpoint recovery factory for the job manager.
      Specified by:
      getCheckpointRecoveryFactory in interface HighAvailabilityServices
      Returns:
      Checkpoint recovery factory
      Throws:
      Exception
    • getExecutionPlanStore

      public ExecutionPlanStore getExecutionPlanStore() throws Exception
      Description copied from interface: HighAvailabilityServices
      Gets the submitted execution plan store for the job manager.
      Specified by:
      getExecutionPlanStore in interface HighAvailabilityServices
      Returns:
      Submitted execution plan store
      Throws:
      Exception - if the submitted execution plan store could not be created
    • getJobResultStore

      public JobResultStore getJobResultStore() throws Exception
      Description copied from interface: HighAvailabilityServices
      Gets the store that holds information about the state of finished jobs.
      Specified by:
      getJobResultStore in interface HighAvailabilityServices
      Returns:
      Store of finished job results
      Throws:
      Exception - if job result store could not be created
    • createBlobStore

      public BlobStore createBlobStore()
      Description copied from interface: HighAvailabilityServices
      Creates the BLOB store in which BLOBs are stored in a highly-available fashion.
      Specified by:
      createBlobStore in interface HighAvailabilityServices
      Returns:
      Blob store
    • close

      public void close() throws Exception
      Description copied from interface: HighAvailabilityServices
      Closes the high availability services, releasing all resources.

      This method does not delete or clean up any data stored in external stores (file systems, ZooKeeper, etc). Another instance of the high availability services will be able to recover the job.

      If an exception occurs during closing services, this method will attempt to continue closing other services and report exceptions only after all services have been attempted to be closed.

      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface HighAvailabilityServices
      Throws:
      Exception - Thrown, if an exception occurred while closing these services.
    • cleanupAllData

      public void cleanupAllData() throws Exception
      Description copied from interface: HighAvailabilityServices
      Deletes all data stored by high availability services in external stores.

      After this method was called, any job or session that was managed by these high availability services will be unrecoverable.

      If an exception occurs during cleanup, this method will attempt to continue the cleanup and report exceptions only after all cleanup steps have been attempted.

      Specified by:
      cleanupAllData in interface HighAvailabilityServices
      Throws:
      Exception - if an error occurred while cleaning up data stored by them.
    • globalCleanupAsync

      public CompletableFuture<Void> globalCleanupAsync(org.apache.flink.api.common.JobID jobID, Executor executor)
      Description copied from interface: GloballyCleanableResource
      globalCleanupAsync is expected to be called from the main thread. Heavy IO tasks should be outsourced into the passed cleanupExecutor. Thread-safety must be ensured.
      Specified by:
      globalCleanupAsync in interface GloballyCleanableResource
      Specified by:
      globalCleanupAsync in interface HighAvailabilityServices
      Parameters:
      jobID - The JobID of the job for which the local data should be cleaned up.
      executor - The fallback executor for IO-heavy operations.
      Returns:
      The cleanup result future.
    • createLeaderRetrievalService

      protected abstract LeaderRetrievalService createLeaderRetrievalService(String leaderName)
      Create leader retrieval service with specified leaderName.
      Parameters:
      leaderName - ConfigMap name in Kubernetes or child node path in Zookeeper.
      Returns:
      Return LeaderRetrievalService using Zookeeper or Kubernetes.
    • createCheckpointRecoveryFactory

      protected abstract CheckpointRecoveryFactory createCheckpointRecoveryFactory() throws Exception
      Create the checkpoint recovery factory for the job manager.
      Returns:
      Checkpoint recovery factory
      Throws:
      Exception
    • createExecutionPlanStore

      protected abstract ExecutionPlanStore createExecutionPlanStore() throws Exception
      Create the submitted execution plan store for the job manager.
      Returns:
      Submitted execution plan store
      Throws:
      Exception - if the submitted execution plan store could not be created
    • internalClose

      protected abstract void internalClose() throws Exception
      Closes the components which is used for external operations(e.g. Zookeeper Client, Kubernetes Client).
      Throws:
      Exception - if the close operation failed
    • internalCleanup

      protected abstract void internalCleanup() throws Exception
      Clean up the meta data in the distributed system(e.g. Zookeeper, Kubernetes ConfigMap).

      If an exception occurs during internal cleanup, we will continue the cleanup in cleanupAllData() and report exceptions only after all cleanup steps have been attempted.

      Throws:
      Exception - when do the cleanup operation on external storage.
    • internalCleanupJobData

      protected abstract void internalCleanupJobData(org.apache.flink.api.common.JobID jobID) throws Exception
      Clean up the meta data in the distributed system(e.g. Zookeeper, Kubernetes ConfigMap) for the specified Job. Method implementations need to be thread-safe.
      Parameters:
      jobID - The identifier of the job to cleanup.
      Throws:
      Exception - when do the cleanup operation on external storage.
    • getLeaderPathForResourceManager

      protected abstract String getLeaderPathForResourceManager()
      Get the leader path for ResourceManager.
      Returns:
      Return the ResourceManager leader name. It is ConfigMap name in Kubernetes or child node path in Zookeeper.
    • getLeaderPathForDispatcher

      protected abstract String getLeaderPathForDispatcher()
      Get the leader path for Dispatcher.
      Returns:
      Return the Dispatcher leader name. It is ConfigMap name in Kubernetes or child node path in Zookeeper.
    • getLeaderPathForJobManager

      protected abstract String getLeaderPathForJobManager(org.apache.flink.api.common.JobID jobID)
      Get the leader path for specific JobManager.
      Parameters:
      jobID - job id
      Returns:
      Return the JobManager leader name for specified job id. It is ConfigMap name in Kubernetes or child node path in Zookeeper.
    • getLeaderPathForRestServer

      protected abstract String getLeaderPathForRestServer()
      Get the leader path for RestServer.
      Returns:
      Return the RestServer leader name. It is ConfigMap name in Kubernetes or child node path in Zookeeper.