Class FileSystemCommitter
It's used to commit data to FileSystem table in batch mode.
Data consistency: 1.For task failure: will launch a new task and create a PartitionTempFileManager, this will clean previous temporary files (This simple design can make
it easy to delete the invalid temporary directory of the task, but it also causes that our
directory does not support the same task to start multiple backups to run). 2.For job master
commit failure when overwrite: this may result in unfinished intermediate results, but if we try
to run job again, the final result must be correct (because the intermediate result will be
overwritten). 3.For job master commit failure when append: This can lead to inconsistent data.
But, considering that the commit action is a single point of execution, and only moves files and
updates metadata, it will be faster, so the probability of inconsistency is relatively small.
-
Constructor Summary
ConstructorsConstructorDescriptionFileSystemCommitter(FileSystemFactory factory, TableMetaStoreFactory metaStoreFactory, boolean overwrite, org.apache.flink.core.fs.Path tmpPath, int partitionColumnSize, boolean isToLocal, org.apache.flink.table.catalog.ObjectIdentifier identifier, LinkedHashMap<String, String> staticPartitions, List<PartitionCommitPolicy> policies) -
Method Summary
Modifier and TypeMethodDescriptionvoidFor committing job's output after successful batch job completion.voidcommitPartitions(BiPredicate<Integer, Integer> taskAttemptFilter) Commits the partitions with a filter to filter out invalid task attempt files.voidcommitPartitionsWithFiles(Map<String, List<org.apache.flink.core.fs.Path>> partitionsFiles) For committing job's output after successful batch job completion, it will commit with the given partitions and corresponding files written which means it'll move the temporary files to partition's location.
-
Constructor Details
-
FileSystemCommitter
public FileSystemCommitter(FileSystemFactory factory, TableMetaStoreFactory metaStoreFactory, boolean overwrite, org.apache.flink.core.fs.Path tmpPath, int partitionColumnSize, boolean isToLocal, org.apache.flink.table.catalog.ObjectIdentifier identifier, LinkedHashMap<String, String> staticPartitions, List<PartitionCommitPolicy> policies)
-
-
Method Details
-
commitPartitions
For committing job's output after successful batch job completion.- Throws:
Exception
-
commitPartitions
Commits the partitions with a filter to filter out invalid task attempt files. In speculative execution mode, there might be some files which do not belong to the finished attempt.- Parameters:
taskAttemptFilter- the filter that accepts subtaskIndex and attemptNumber- Throws:
Exception- if partition commitment fails
-
commitPartitionsWithFiles
public void commitPartitionsWithFiles(Map<String, List<org.apache.flink.core.fs.Path>> partitionsFiles) throws ExceptionFor committing job's output after successful batch job completion, it will commit with the given partitions and corresponding files written which means it'll move the temporary files to partition's location.- Throws:
Exception
-