Package org.apache.flink.runtime.shuffle
Class NettyShuffleMaster
java.lang.Object
org.apache.flink.runtime.shuffle.NettyShuffleMaster
- All Implemented Interfaces:
AutoCloseable,ShuffleMaster<NettyShuffleDescriptor>
Default
ShuffleMaster for netty and local file based shuffle implementation.-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Closes this shuffle master service which should release all resources.org.apache.flink.configuration.MemorySizeJM announces network memory requirement from the calculating result of this method.getPartitionWithMetrics(org.apache.flink.api.common.JobID jobId, Duration timeout, Set<ResultPartitionID> expectedPartitions) Retrieves specified partitions and their metrics (identified byexpectedPartitions), the metrics include sizes of sub-partitions in a result partition.getShuffleDescriptor(org.apache.flink.api.common.JobID jobID, ResultPartitionID resultPartitionID) voidnotifyPartitionRecoveryStarted(org.apache.flink.api.common.JobID jobId) Notifies that the recovery process of result partitions has started.voidregisterJob(JobShuffleContext context) Registers the target job together with the correspondingJobShuffleContextto this shuffle master.registerPartitionWithProducer(org.apache.flink.api.common.JobID jobID, PartitionDescriptor partitionDescriptor, ProducerDescriptor producerDescriptor) Asynchronously register a partition and its producer with the shuffle service.voidreleasePartitionExternally(ShuffleDescriptor shuffleDescriptor) Release any external resources occupied by the given partition.voidrestoreState(List<ShuffleMasterSnapshot> snapshots, org.apache.flink.api.common.JobID jobId) Restores the state of the shuffle master from the provided snapshots for the specified job.voidrestoreState(ShuffleMasterSnapshot snapshot) Restores the state of the shuffle master from the provided snapshots.voidsnapshotState(CompletableFuture<ShuffleMasterSnapshot> snapshotFuture) Triggers a snapshot of the shuffle master's state.voidsnapshotState(CompletableFuture<ShuffleMasterSnapshot> snapshotFuture, ShuffleMasterSnapshotContext context, org.apache.flink.api.common.JobID jobId) Triggers a snapshot of the shuffle master's state which related the specified job.booleanWhether the shuffle master supports taking snapshot in batch scenarios ifBatchExecutionOptions.JOB_RECOVERY_ENABLEDis true.voidunregisterJob(org.apache.flink.api.common.JobID jobId) Unregisters the target job from this shuffle master, which means the corresponding job has reached a global termination state and all the allocated resources except for the cluster partitions can be cleared.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.flink.runtime.shuffle.ShuffleMaster
start
-
Constructor Details
-
NettyShuffleMaster
-
-
Method Details
-
registerPartitionWithProducer
public CompletableFuture<NettyShuffleDescriptor> registerPartitionWithProducer(org.apache.flink.api.common.JobID jobID, PartitionDescriptor partitionDescriptor, ProducerDescriptor producerDescriptor) Description copied from interface:ShuffleMasterAsynchronously register a partition and its producer with the shuffle service.The returned shuffle descriptor is an internal handle which identifies the partition internally within the shuffle service. The descriptor should provide enough information to read from or write data to the partition.
- Specified by:
registerPartitionWithProducerin interfaceShuffleMaster<NettyShuffleDescriptor>- Parameters:
jobID- job ID of the corresponding job which registered the partitionpartitionDescriptor- general job graph information about the partitionproducerDescriptor- general producer information (location, execution id, connection info)- Returns:
- future with the partition shuffle descriptor used for producer/consumer deployment and their data exchange.
-
releasePartitionExternally
Description copied from interface:ShuffleMasterRelease any external resources occupied by the given partition.This call triggers release of any resources which are occupied by the given partition in the external systems outside of the producer executor. This is mostly relevant for the batch jobs and blocking result partitions. The producer local resources are managed by
ShuffleDescriptor.storesLocalResourcesOn()andShuffleEnvironment.releasePartitionsLocally(Collection).- Specified by:
releasePartitionExternallyin interfaceShuffleMaster<NettyShuffleDescriptor>- Parameters:
shuffleDescriptor- shuffle descriptor of the result partition to release externally.
-
getShuffleDescriptor
public Optional<ShuffleDescriptor> getShuffleDescriptor(org.apache.flink.api.common.JobID jobID, ResultPartitionID resultPartitionID) -
computeShuffleMemorySizeForTask
public org.apache.flink.configuration.MemorySize computeShuffleMemorySizeForTask(TaskInputsOutputsDescriptor desc) JM announces network memory requirement from the calculating result of this method. Please note that the calculating algorithm depends on both I/O details of a vertex and network configuration, which means we should always keep the consistency of configurations between JM, RM and TM in fine-grained resource management, thus to guarantee that the processes of memory announcing and allocating respect each other.- Specified by:
computeShuffleMemorySizeForTaskin interfaceShuffleMaster<NettyShuffleDescriptor>- Parameters:
desc- describes task inputs and outputs information for shuffle memory calculation.- Returns:
- shuffle memory size for a task with the given
TaskInputsOutputsDescriptor.
-
getPartitionWithMetrics
public CompletableFuture<Collection<PartitionWithMetrics>> getPartitionWithMetrics(org.apache.flink.api.common.JobID jobId, Duration timeout, Set<ResultPartitionID> expectedPartitions) Description copied from interface:ShuffleMasterRetrieves specified partitions and their metrics (identified byexpectedPartitions), the metrics include sizes of sub-partitions in a result partition.- Specified by:
getPartitionWithMetricsin interfaceShuffleMaster<NettyShuffleDescriptor>- Parameters:
jobId- ID of the target jobtimeout- The timeout used for retrieve the specified partitions.expectedPartitions- The set of identifiers for the result partitions whose metrics are to be fetched.- Returns:
- A future will contain a collection of the partitions with their metrics that could be retrieved from the expected partitions within the specified timeout period.
-
registerJob
Description copied from interface:ShuffleMasterRegisters the target job together with the correspondingJobShuffleContextto this shuffle master. Through the shuffle context, one can obtain some basic information like job ID, job configuration. It enables ShuffleMaster to notify JobMaster about lost result partitions, so that JobMaster can identify and reproduce unavailable partitions earlier.- Specified by:
registerJobin interfaceShuffleMaster<NettyShuffleDescriptor>- Parameters:
context- the corresponding shuffle context of the target job.
-
unregisterJob
public void unregisterJob(org.apache.flink.api.common.JobID jobId) Description copied from interface:ShuffleMasterUnregisters the target job from this shuffle master, which means the corresponding job has reached a global termination state and all the allocated resources except for the cluster partitions can be cleared.- Specified by:
unregisterJobin interfaceShuffleMaster<NettyShuffleDescriptor>- Parameters:
jobId- ID of the target job to be unregistered.
-
supportsBatchSnapshot
public boolean supportsBatchSnapshot()Description copied from interface:ShuffleMasterWhether the shuffle master supports taking snapshot in batch scenarios ifBatchExecutionOptions.JOB_RECOVERY_ENABLEDis true. If it returns true, Flink will callShuffleMaster.snapshotState(java.util.concurrent.CompletableFuture<org.apache.flink.runtime.shuffle.ShuffleMasterSnapshot>)to take snapshot, and callShuffleMaster.restoreState(org.apache.flink.runtime.shuffle.ShuffleMasterSnapshot)to restore the state of shuffle master.- Specified by:
supportsBatchSnapshotin interfaceShuffleMaster<NettyShuffleDescriptor>
-
snapshotState
public void snapshotState(CompletableFuture<ShuffleMasterSnapshot> snapshotFuture, ShuffleMasterSnapshotContext context, org.apache.flink.api.common.JobID jobId) Description copied from interface:ShuffleMasterTriggers a snapshot of the shuffle master's state which related the specified job.- Specified by:
snapshotStatein interfaceShuffleMaster<NettyShuffleDescriptor>
-
snapshotState
Description copied from interface:ShuffleMasterTriggers a snapshot of the shuffle master's state.- Specified by:
snapshotStatein interfaceShuffleMaster<NettyShuffleDescriptor>
-
restoreState
Description copied from interface:ShuffleMasterRestores the state of the shuffle master from the provided snapshots.- Specified by:
restoreStatein interfaceShuffleMaster<NettyShuffleDescriptor>
-
restoreState
public void restoreState(List<ShuffleMasterSnapshot> snapshots, org.apache.flink.api.common.JobID jobId) Description copied from interface:ShuffleMasterRestores the state of the shuffle master from the provided snapshots for the specified job.- Specified by:
restoreStatein interfaceShuffleMaster<NettyShuffleDescriptor>
-
notifyPartitionRecoveryStarted
public void notifyPartitionRecoveryStarted(org.apache.flink.api.common.JobID jobId) Description copied from interface:ShuffleMasterNotifies that the recovery process of result partitions has started.- Specified by:
notifyPartitionRecoveryStartedin interfaceShuffleMaster<NettyShuffleDescriptor>- Parameters:
jobId- ID of the target job
-
close
Description copied from interface:ShuffleMasterCloses this shuffle master service which should release all resources. A shuffle master will only be closed when the cluster is shut down.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceShuffleMaster<NettyShuffleDescriptor>- Throws:
Exception
-