Class BlockSplittingRecursiveAllDirEnumerator

All Implemented Interfaces:
FileEnumerator

@Internal public class BlockSplittingRecursiveAllDirEnumerator extends BlockSplittingRecursiveEnumerator
This FileEnumerator enumerates all files under the given paths recursively except the hidden directories, and creates a separate split for each file block.

Please note that file blocks are only exposed by some file systems, such as HDFS. File systems that do not expose block information will not create multiple file splits per file, but keep the files as one source split.

Files with suffixes corresponding to known compression formats (for example '.gzip', '.bz2', ...) will not be split. See StandardDeCompressors for a list of known formats and suffixes.

Compared to BlockSplittingRecursiveEnumerator, this enumerator will enumerate all files even through its parent directory is filtered out by the file filter.

  • Constructor Details

    • BlockSplittingRecursiveAllDirEnumerator

      public BlockSplittingRecursiveAllDirEnumerator(String pathPattern)
      Creates a new enumerator that enumerates all files whose file path matches the regex except hidden files. Hidden files are considered files where the filename starts with '.' or with '_'.

      The enumerator does not split files that have a suffix corresponding to a known compression format (for example '.gzip', '.bz2', '.xy', '.zip', ...). See StandardDeCompressors for details.

    • BlockSplittingRecursiveAllDirEnumerator

      public BlockSplittingRecursiveAllDirEnumerator(Predicate<org.apache.flink.core.fs.Path> fileFilter, String[] nonSplittableFileSuffixes)
      Creates a new enumerator that uses the given predicate as a filter for file paths, and avoids splitting files with the given extension (typically to avoid splitting compressed files).
  • Method Details