Interface HighAvailabilityServices

All Superinterfaces:
AutoCloseable, ClientHighAvailabilityServices, GloballyCleanableResource
All Known Implementing Classes:
AbstractHaServices, AbstractNonHaServices, EmbeddedHaServices, EmbeddedHaServicesWithLeadershipControl, StandaloneHaServices, ZooKeeperLeaderElectionHaServices

public interface HighAvailabilityServices extends ClientHighAvailabilityServices, GloballyCleanableResource
The HighAvailabilityServices give access to all services needed for a highly-available setup. In particular, the services provide access to highly available storage and registries, as well as distributed counters and leader election.
  • ResourceManager leader election and leader retrieval
  • JobManager leader election and leader retrieval
  • Persistence for checkpoint metadata
  • Registering the latest completed checkpoint(s)
  • Persistence for the BLOB store
  • Registry that marks a job's status
  • Naming of RPC endpoints
  • Field Details

    • DEFAULT_LEADER_ID

      static final UUID DEFAULT_LEADER_ID
      This UUID should be used when no proper leader election happens, but a simple pre-configured leader is used. That is for example the case in non-highly-available standalone setups.
    • DEFAULT_JOB_ID

      static final org.apache.flink.api.common.JobID DEFAULT_JOB_ID
      This JobID should be used to identify the old JobManager when using the HighAvailabilityServices. With the new mode every JobMaster will have a distinct JobID assigned.
  • Method Details

    • getResourceManagerLeaderRetriever

      LeaderRetrievalService getResourceManagerLeaderRetriever()
      Gets the leader retriever for the cluster's resource manager.
    • getDispatcherLeaderRetriever

      LeaderRetrievalService getDispatcherLeaderRetriever()
      Gets the leader retriever for the dispatcher. This leader retrieval service is not always accessible.
    • getJobManagerLeaderRetriever

      @Deprecated LeaderRetrievalService getJobManagerLeaderRetriever(org.apache.flink.api.common.JobID jobID)
      Deprecated.
      This method should only be used by the legacy code where the JobManager acts as the master.
      Gets the leader retriever for the job JobMaster which is responsible for the given job.
      Parameters:
      jobID - The identifier of the job.
      Returns:
      Leader retrieval service to retrieve the job manager for the given job
    • getJobManagerLeaderRetriever

      LeaderRetrievalService getJobManagerLeaderRetriever(org.apache.flink.api.common.JobID jobID, String defaultJobManagerAddress)
      Gets the leader retriever for the job JobMaster which is responsible for the given job.
      Parameters:
      jobID - The identifier of the job.
      defaultJobManagerAddress - JobManager address which will be returned by a static leader retrieval service.
      Returns:
      Leader retrieval service to retrieve the job manager for the given job
    • getWebMonitorLeaderRetriever

      @Deprecated default LeaderRetrievalService getWebMonitorLeaderRetriever()
      Deprecated.
      just use getClusterRestEndpointLeaderRetriever() instead of this method.
      This retriever should no longer be used on the cluster side. The web monitor retriever is only required on the client-side and we have a dedicated high-availability services for the client, named ClientHighAvailabilityServices. See also FLINK-13750.
      Returns:
      the leader retriever for web monitor
    • getResourceManagerLeaderElection

      LeaderElection getResourceManagerLeaderElection()
      Gets the LeaderElection for the cluster's resource manager.
    • getDispatcherLeaderElection

      LeaderElection getDispatcherLeaderElection()
      Gets the LeaderElection for the cluster's dispatcher.
    • getJobManagerLeaderElection

      LeaderElection getJobManagerLeaderElection(org.apache.flink.api.common.JobID jobID)
      Gets the LeaderElection for the job with the given JobID.
    • getWebMonitorLeaderElection

      @Deprecated default LeaderElection getWebMonitorLeaderElection()
      Deprecated.
      Gets the LeaderElection for the cluster's rest endpoint.
    • getCheckpointRecoveryFactory

      CheckpointRecoveryFactory getCheckpointRecoveryFactory() throws Exception
      Gets the checkpoint recovery factory for the job manager.
      Returns:
      Checkpoint recovery factory
      Throws:
      Exception
    • getExecutionPlanStore

      ExecutionPlanStore getExecutionPlanStore() throws Exception
      Gets the submitted execution plan store for the job manager.
      Returns:
      Submitted execution plan store
      Throws:
      Exception - if the submitted execution plan store could not be created
    • getJobResultStore

      JobResultStore getJobResultStore() throws Exception
      Gets the store that holds information about the state of finished jobs.
      Returns:
      Store of finished job results
      Throws:
      Exception - if job result store could not be created
    • createBlobStore

      BlobStore createBlobStore() throws IOException
      Creates the BLOB store in which BLOBs are stored in a highly-available fashion.
      Returns:
      Blob store
      Throws:
      IOException - if the blob store could not be created
    • getClusterRestEndpointLeaderElection

      default LeaderElection getClusterRestEndpointLeaderElection()
      Gets the LeaderElection for the cluster's rest endpoint.
    • getClusterRestEndpointLeaderRetriever

      default LeaderRetrievalService getClusterRestEndpointLeaderRetriever()
      Description copied from interface: ClientHighAvailabilityServices
      Get the leader retriever for the cluster's rest endpoint.
      Specified by:
      getClusterRestEndpointLeaderRetriever in interface ClientHighAvailabilityServices
      Returns:
      the leader retriever for cluster's rest endpoint.
    • close

      void close() throws Exception
      Closes the high availability services, releasing all resources.

      This method does not delete or clean up any data stored in external stores (file systems, ZooKeeper, etc). Another instance of the high availability services will be able to recover the job.

      If an exception occurs during closing services, this method will attempt to continue closing other services and report exceptions only after all services have been attempted to be closed.

      Specified by:
      close in interface AutoCloseable
      Throws:
      Exception - Thrown, if an exception occurred while closing these services.
    • cleanupAllData

      void cleanupAllData() throws Exception
      Deletes all data stored by high availability services in external stores.

      After this method was called, any job or session that was managed by these high availability services will be unrecoverable.

      If an exception occurs during cleanup, this method will attempt to continue the cleanup and report exceptions only after all cleanup steps have been attempted.

      Throws:
      Exception - if an error occurred while cleaning up data stored by them.
    • closeWithOptionalClean

      default void closeWithOptionalClean(boolean cleanupData) throws Exception
      Calls cleanupAllData() (if true is passed as a parameter) before calling close() on this instance. Any error that appeared during the cleanup will be propagated after calling close().
      Throws:
      Exception
    • globalCleanupAsync

      default CompletableFuture<Void> globalCleanupAsync(org.apache.flink.api.common.JobID jobId, Executor executor)
      Description copied from interface: GloballyCleanableResource
      globalCleanupAsync is expected to be called from the main thread. Heavy IO tasks should be outsourced into the passed cleanupExecutor. Thread-safety must be ensured.
      Specified by:
      globalCleanupAsync in interface GloballyCleanableResource
      Parameters:
      jobId - The JobID of the job for which the local data should be cleaned up.
      executor - The fallback executor for IO-heavy operations.
      Returns:
      The cleanup result future.