Class NonSplittingRecursiveEnumerator

java.lang.Object
org.apache.flink.connector.file.src.enumerate.NonSplittingRecursiveEnumerator
All Implemented Interfaces:
FileEnumerator
Direct Known Subclasses:
BlockSplittingRecursiveEnumerator, NonSplittingRecursiveAllDirEnumerator

@PublicEvolving public class NonSplittingRecursiveEnumerator extends Object implements FileEnumerator
This FileEnumerator enumerates all files under the given paths recursively. Each file becomes one split; this enumerator does not split files into smaller "block" units.

The default instantiation of this enumerator filters files with the common hidden file prefixes '.' and '_'. A custom file filter can be specified.

  • Field Details

    • fileFilter

      protected final Predicate<org.apache.flink.core.fs.Path> fileFilter
      The filter predicate to filter out unwanted files.
  • Constructor Details

    • NonSplittingRecursiveEnumerator

      public NonSplittingRecursiveEnumerator()
      Creates a NonSplittingRecursiveEnumerator that enumerates all files except hidden files. Hidden files are considered files where the filename starts with '.' or with '_'.
    • NonSplittingRecursiveEnumerator

      public NonSplittingRecursiveEnumerator(Predicate<org.apache.flink.core.fs.Path> fileFilter)
      Creates a NonSplittingRecursiveEnumerator that uses the given predicate as a filter for file paths.
  • Method Details

    • enumerateSplits

      public Collection<FileSourceSplit> enumerateSplits(org.apache.flink.core.fs.Path[] paths, int minDesiredSplits) throws IOException
      Description copied from interface: FileEnumerator
      Generates all file splits for the relevant files under the given paths. The minDesiredSplits is an optional hint indicating how many splits would be necessary to exploit parallelism properly.
      Specified by:
      enumerateSplits in interface FileEnumerator
      Throws:
      IOException
    • addSplitsForPath

      protected void addSplitsForPath(org.apache.flink.core.fs.FileStatus fileStatus, org.apache.flink.core.fs.FileSystem fs, ArrayList<FileSourceSplit> target) throws IOException
      Throws:
      IOException
    • convertToSourceSplits

      protected void convertToSourceSplits(org.apache.flink.core.fs.FileStatus file, org.apache.flink.core.fs.FileSystem fs, List<FileSourceSplit> target) throws IOException
      Throws:
      IOException
    • getNextId

      protected final String getNextId()