java.io.Closeable, java.lang.AutoCloseable, BZip2Constantspublic class CBZip2InputStream extends java.io.InputStream implements BZip2Constants
The decompression requires large amounts of memory. Thus you should call the
close() method as soon as possible, to force
CBZip2InputStream to release the allocated memory. See
CBZip2OutputStream for information about memory
usage.
CBZip2InputStream reads bytes from the compressed source stream via
the single byte read() method exclusively.
Thus you should consider to use a buffered source stream.
This Ant code was enhanced so that it can de-compress blocks of bzip2 data. Current position in the stream is an important statistic for Hadoop. For example in LineRecordReader, we solely depend on the current position in the stream to know about the progress. The notion of position becomes complicated for compressed files. The Hadoop splitting is done in terms of compressed file. But a compressed file deflates to a large amount of data. So we have handled this problem in the following way. On object creation time, we find the next block start delimiter. Once such a marker is found, the stream stops there (we discard any read compressed data in this process) and the position is reported as the beginning of the block start delimiter. At this point we are ready for actual reading (i.e. decompression) of data. The subsequent read calls give out data. The position is updated when the caller of this class has read off the current block + 1 bytes. In between the block reading, position is not updated. (We can only update the position on block boundaries).
Instances of this class are not threadsafe.
| Modifier and Type | Class | Description |
|---|---|---|
static class |
CBZip2InputStream.STATE |
A state machine to keep track of current state of the de-coder
|
| Modifier and Type | Field | Description |
|---|---|---|
static long |
BLOCK_DELIMITER |
|
static long |
EOS_DELIMITER |
baseBlockSize, END_OF_BLOCK, END_OF_STREAM, G_SIZE, MAX_ALPHA_SIZE, MAX_CODE_LEN, MAX_SELECTORS, N_GROUPS, N_ITERS, NUM_OVERSHOOT_BYTES, rNums, RUNA, RUNB| Constructor | Description |
|---|---|
CBZip2InputStream(java.io.InputStream in) |
|
CBZip2InputStream(java.io.InputStream in,
SplittableCompressionCodec.READ_MODE readMode) |
Constructs a new CBZip2InputStream which decompresses bytes read from the
specified stream.
|
| Modifier and Type | Method | Description |
|---|---|---|
void |
close() |
|
long |
getProcessedByteCount() |
This method reports the processed bytes so far.
|
static long |
numberOfBytesTillNextMarker(java.io.InputStream in) |
Returns the number of bytes between the current stream position
and the immediate next BZip2 block marker.
|
int |
read() |
|
int |
read(byte[] dest,
int offs,
int len) |
In CONTINOUS reading mode, this read method starts from the
start of the compressed stream and end at the end of file by
emitting un-compressed data.
|
protected void |
reportCRCError() |
|
boolean |
skipToNextMarker(long marker,
int markerBitLength) |
This method tries to find the marker (passed to it as the first parameter)
in the stream.
|
protected void |
updateProcessedByteCount(int count) |
This method keeps track of raw processed compressed
bytes.
|
void |
updateReportedByteCount(int count) |
This method is called by the client of this
class in case there are any corrections in
the stream position.
|
public static final long BLOCK_DELIMITER
public static final long EOS_DELIMITER
public CBZip2InputStream(java.io.InputStream in,
SplittableCompressionCodec.READ_MODE readMode)
throws java.io.IOException
Although BZip2 headers are marked with the magic "Bz" this constructor expects the next byte in the stream to be the first one after the magic. Thus callers have to skip the first two bytes. Otherwise this constructor will throw an exception.
in - in.readMode - READ_MODE.java.io.IOException - if the stream content is malformed or an I/O error occurs.java.lang.NullPointerException - if in == nullpublic CBZip2InputStream(java.io.InputStream in)
throws java.io.IOException
java.io.IOExceptionpublic long getProcessedByteCount()
protected void updateProcessedByteCount(int count)
count - count is the number of bytes to be
added to raw processed bytespublic void updateReportedByteCount(int count)
count - count bytes are added to the reported bytespublic boolean skipToNextMarker(long marker,
int markerBitLength)
throws java.io.IOException,
java.lang.IllegalArgumentException
marker - The bit pattern to be found in the streammarkerBitLength - No of bits in the markerjava.io.IOException - raised on errors performing I/O.java.lang.IllegalArgumentException - if marketBitLength is greater than 63protected void reportCRCError()
throws java.io.IOException
java.io.IOExceptionpublic static long numberOfBytesTillNextMarker(java.io.InputStream in)
throws java.io.IOException
in - The InputStreamjava.io.IOException - raised on errors performing I/O.public int read()
throws java.io.IOException
read in class java.io.InputStreamjava.io.IOExceptionpublic int read(byte[] dest,
int offs,
int len)
throws java.io.IOException
read in class java.io.InputStreamjava.io.IOException - if the stream content is malformed or an I/O error occurs.public void close()
throws java.io.IOException
close in interface java.lang.AutoCloseableclose in interface java.io.Closeableclose in class java.io.InputStreamjava.io.IOExceptionCopyright © 2008–2025 Apache Software Foundation. All rights reserved.