Class HadoopDataInputStream
- All Implemented Interfaces:
Closeable,AutoCloseable,org.apache.flink.core.fs.ByteBufferReadable
FSDataInputStream for Hadoop's input streams. This
supports all file systems supported by Hadoop, such as HDFS and S3 (S3a/S3n).-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intMinimum amount of bytes to skip forward before we issue a seek instead of discarding read. -
Constructor Summary
ConstructorsConstructorDescriptionHadoopDataInputStream(org.apache.hadoop.fs.FSDataInputStream fsDataInputStream) Creates a new data input stream from the given Hadoop input stream. -
Method Summary
Modifier and TypeMethodDescriptionintvoidclose()voidforceSeek(long seekPos) Positions the stream to the given location.org.apache.hadoop.fs.FSDataInputStreamGets the wrapped Hadoop input stream.longgetPos()intread()intread(byte[] buffer, int offset, int length) intread(long position, ByteBuffer byteBuffer) intread(ByteBuffer byteBuffer) voidseek(long seekPos) longskip(long n) voidskipFully(long bytes) Skips over a given amount of bytes in the stream.Methods inherited from class java.io.InputStream
mark, markSupported, nullInputStream, read, readAllBytes, readNBytes, readNBytes, reset, transferTo
-
Field Details
-
MIN_SKIP_BYTES
public static final int MIN_SKIP_BYTESMinimum amount of bytes to skip forward before we issue a seek instead of discarding read.The current value is just a magic number. In the long run, this value could become configurable, but for now it is a conservative, relatively small value that should bring safe improvements for small skips (e.g. in reading meta data), that would hurt the most with frequent seeks.
The optimal value depends on the DFS implementation and configuration plus the underlying filesystem. For now, this number is chosen "big enough" to provide improvements for smaller seeks, and "small enough" to avoid disadvantages over real seeks. While the minimum should be the page size, a true optimum per system would be the amounts of bytes the can be consumed sequentially within the seektime. Unfortunately, seektime is not constant and devices, OS, and DFS potentially also use read buffers and read-ahead.
- See Also:
-
-
Constructor Details
-
HadoopDataInputStream
public HadoopDataInputStream(org.apache.hadoop.fs.FSDataInputStream fsDataInputStream) Creates a new data input stream from the given Hadoop input stream.- Parameters:
fsDataInputStream- The Hadoop input stream
-
-
Method Details
-
seek
- Specified by:
seekin classorg.apache.flink.core.fs.FSDataInputStream- Throws:
IOException
-
getPos
- Specified by:
getPosin classorg.apache.flink.core.fs.FSDataInputStream- Throws:
IOException
-
read
- Specified by:
readin classInputStream- Throws:
IOException
-
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classInputStream- Throws:
IOException
-
read
- Overrides:
readin classInputStream- Throws:
IOException
-
available
- Overrides:
availablein classInputStream- Throws:
IOException
-
skip
- Overrides:
skipin classInputStream- Throws:
IOException
-
getHadoopInputStream
public org.apache.hadoop.fs.FSDataInputStream getHadoopInputStream()Gets the wrapped Hadoop input stream.- Returns:
- The wrapped Hadoop input stream.
-
forceSeek
Positions the stream to the given location. In contrast toseek(long), this method will always issue a "seek" command to the dfs and may not replace it byskip(long)for small seeks.Notice that the underlying DFS implementation can still decide to do skip instead of seek.
- Parameters:
seekPos- the position to seek to.- Throws:
IOException
-
skipFully
Skips over a given amount of bytes in the stream.- Parameters:
bytes- the number of bytes to skip.- Throws:
IOException
-
read
- Specified by:
readin interfaceorg.apache.flink.core.fs.ByteBufferReadable- Throws:
IOException
-
read
- Specified by:
readin interfaceorg.apache.flink.core.fs.ByteBufferReadable- Throws:
IOException
-