Class BlockSplittingRecursiveEnumerator
java.lang.Object
org.apache.flink.connector.file.src.enumerate.NonSplittingRecursiveEnumerator
org.apache.flink.connector.file.src.enumerate.BlockSplittingRecursiveEnumerator
- All Implemented Interfaces:
FileEnumerator
- Direct Known Subclasses:
BlockSplittingRecursiveAllDirEnumerator
@PublicEvolving
public class BlockSplittingRecursiveEnumerator
extends NonSplittingRecursiveEnumerator
This
FileEnumerator enumerates all files under the given paths recursively, and creates a
separate split for each file block.
Please note that file blocks are only exposed by some file systems, such as HDFS. File systems that do not expose block information will not create multiple file splits per file, but keep the files as one source split.
Files with suffixes corresponding to known compression formats (for example '.gzip', '.bz2',
...) will not be split. See StandardDeCompressors for a list of known formats and
suffixes.
The default instantiation of this enumerator filters files with the common hidden file prefixes '.' and '_'. A custom file filter can be specified.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.flink.connector.file.src.enumerate.FileEnumerator
FileEnumerator.Provider -
Field Summary
Fields inherited from class org.apache.flink.connector.file.src.enumerate.NonSplittingRecursiveEnumerator
fileFilter -
Constructor Summary
ConstructorsConstructorDescriptionCreates a new enumerator that enumerates all files except hidden files.BlockSplittingRecursiveEnumerator(Predicate<org.apache.flink.core.fs.Path> fileFilter, String[] nonSplittableFileSuffixes) Creates a new enumerator that uses the given predicate as a filter for file paths, and avoids splitting files with the given extension (typically to avoid splitting compressed files). -
Method Summary
Modifier and TypeMethodDescriptionprotected voidconvertToSourceSplits(org.apache.flink.core.fs.FileStatus file, org.apache.flink.core.fs.FileSystem fs, List<FileSourceSplit> target) protected booleanisFileSplittable(org.apache.flink.core.fs.Path filePath) Methods inherited from class org.apache.flink.connector.file.src.enumerate.NonSplittingRecursiveEnumerator
addSplitsForPath, enumerateSplits, getNextId
-
Constructor Details
-
BlockSplittingRecursiveEnumerator
public BlockSplittingRecursiveEnumerator()Creates a new enumerator that enumerates all files except hidden files. Hidden files are considered files where the filename starts with '.' or with '_'.The enumerator does not split files that have a suffix corresponding to a known compression format (for example '.gzip', '.bz2', '.xy', '.zip', ...). See
StandardDeCompressorsfor details. -
BlockSplittingRecursiveEnumerator
public BlockSplittingRecursiveEnumerator(Predicate<org.apache.flink.core.fs.Path> fileFilter, String[] nonSplittableFileSuffixes) Creates a new enumerator that uses the given predicate as a filter for file paths, and avoids splitting files with the given extension (typically to avoid splitting compressed files).
-
-
Method Details
-
convertToSourceSplits
protected void convertToSourceSplits(org.apache.flink.core.fs.FileStatus file, org.apache.flink.core.fs.FileSystem fs, List<FileSourceSplit> target) throws IOException - Overrides:
convertToSourceSplitsin classNonSplittingRecursiveEnumerator- Throws:
IOException
-
isFileSplittable
protected boolean isFileSplittable(org.apache.flink.core.fs.Path filePath)
-