Class HadoopDataInputStream

java.lang.Object
java.io.InputStream
org.apache.flink.core.fs.FSDataInputStream
org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream
All Implemented Interfaces:
Closeable, AutoCloseable, org.apache.flink.core.fs.ByteBufferReadable

public final class HadoopDataInputStream extends org.apache.flink.core.fs.FSDataInputStream implements org.apache.flink.core.fs.ByteBufferReadable
Concrete implementation of the FSDataInputStream for Hadoop's input streams. This supports all file systems supported by Hadoop, such as HDFS and S3 (S3a/S3n).
  • Field Details

    • MIN_SKIP_BYTES

      public static final int MIN_SKIP_BYTES
      Minimum amount of bytes to skip forward before we issue a seek instead of discarding read.

      The current value is just a magic number. In the long run, this value could become configurable, but for now it is a conservative, relatively small value that should bring safe improvements for small skips (e.g. in reading meta data), that would hurt the most with frequent seeks.

      The optimal value depends on the DFS implementation and configuration plus the underlying filesystem. For now, this number is chosen "big enough" to provide improvements for smaller seeks, and "small enough" to avoid disadvantages over real seeks. While the minimum should be the page size, a true optimum per system would be the amounts of bytes the can be consumed sequentially within the seektime. Unfortunately, seektime is not constant and devices, OS, and DFS potentially also use read buffers and read-ahead.

      See Also:
  • Constructor Details

    • HadoopDataInputStream

      public HadoopDataInputStream(org.apache.hadoop.fs.FSDataInputStream fsDataInputStream)
      Creates a new data input stream from the given Hadoop input stream.
      Parameters:
      fsDataInputStream - The Hadoop input stream
  • Method Details

    • seek

      public void seek(long seekPos) throws IOException
      Specified by:
      seek in class org.apache.flink.core.fs.FSDataInputStream
      Throws:
      IOException
    • getPos

      public long getPos() throws IOException
      Specified by:
      getPos in class org.apache.flink.core.fs.FSDataInputStream
      Throws:
      IOException
    • read

      public int read() throws IOException
      Specified by:
      read in class InputStream
      Throws:
      IOException
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class InputStream
      Throws:
      IOException
    • read

      public int read(@Nonnull byte[] buffer, int offset, int length) throws IOException
      Overrides:
      read in class InputStream
      Throws:
      IOException
    • available

      public int available() throws IOException
      Overrides:
      available in class InputStream
      Throws:
      IOException
    • skip

      public long skip(long n) throws IOException
      Overrides:
      skip in class InputStream
      Throws:
      IOException
    • getHadoopInputStream

      public org.apache.hadoop.fs.FSDataInputStream getHadoopInputStream()
      Gets the wrapped Hadoop input stream.
      Returns:
      The wrapped Hadoop input stream.
    • forceSeek

      public void forceSeek(long seekPos) throws IOException
      Positions the stream to the given location. In contrast to seek(long), this method will always issue a "seek" command to the dfs and may not replace it by skip(long) for small seeks.

      Notice that the underlying DFS implementation can still decide to do skip instead of seek.

      Parameters:
      seekPos - the position to seek to.
      Throws:
      IOException
    • skipFully

      public void skipFully(long bytes) throws IOException
      Skips over a given amount of bytes in the stream.
      Parameters:
      bytes - the number of bytes to skip.
      Throws:
      IOException
    • read

      public int read(ByteBuffer byteBuffer) throws IOException
      Specified by:
      read in interface org.apache.flink.core.fs.ByteBufferReadable
      Throws:
      IOException
    • read

      public int read(long position, ByteBuffer byteBuffer) throws IOException
      Specified by:
      read in interface org.apache.flink.core.fs.ByteBufferReadable
      Throws:
      IOException