Package org.apache.flink.orc
Class AbstractOrcFileInputFormat<T,BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
java.lang.Object
org.apache.flink.orc.AbstractOrcFileInputFormat<T,BatchT,SplitT>
- Type Parameters:
T- The type of records produced by this reader format.
- All Implemented Interfaces:
Serializable,org.apache.flink.api.java.typeutils.ResultTypeQueryable<T>,org.apache.flink.connector.file.src.reader.BulkFormat<T,SplitT>
- Direct Known Subclasses:
OrcColumnarRowInputFormat
public abstract class AbstractOrcFileInputFormat<T,BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
extends Object
implements org.apache.flink.connector.file.src.reader.BulkFormat<T,SplitT>
The base for ORC readers for the
FileSource.
Implements the reader initialization, vectorized reading, and pooling of column vector objects.
Subclasses implement the conversion to the specific result record(s) that they return by
creating via extending AbstractOrcFileInputFormat.OrcReaderBatch.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static classTheOrcReaderBatchclass holds the data structures containing the batch data (column vectors, row arrays, ...) and performs the batch conversion from the ORC representation to the result format.protected static final classA vectorized ORC reader. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final intprotected final List<OrcFilters.Predicate>protected final SerializableHadoopConfigWrapperprotected final org.apache.orc.TypeDescriptionprotected final int[] -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedAbstractOrcFileInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize) -
Method Summary
Modifier and TypeMethodDescriptioncreateReader(org.apache.flink.configuration.Configuration config, SplitT split) abstract AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT> createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<T, BatchT>> recycler, int batchSize) Creates theAbstractOrcFileInputFormat.OrcReaderBatchstructure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.abstract org.apache.flink.api.common.typeinfo.TypeInformation<T>Gets the type produced by this format.booleanrestoreReader(org.apache.flink.configuration.Configuration config, SplitT split)
-
Field Details
-
shim
-
hadoopConfigWrapper
-
schema
protected final org.apache.orc.TypeDescription schema -
selectedFields
protected final int[] selectedFields -
conjunctPredicates
-
batchSize
protected final int batchSize
-
-
Constructor Details
-
AbstractOrcFileInputFormat
protected AbstractOrcFileInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize) - Parameters:
shim- the shim for various Orc dependent versions. If you use the latest version, please useOrcShim.defaultShim()directly.hadoopConfig- the hadoop config for orc reader.schema- the full schema of orc format.selectedFields- the read selected field of orc format.conjunctPredicates- the filter predicates that can be evaluated.batchSize- the batch size of orc reader.
-
-
Method Details
-
createReader
public AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> createReader(org.apache.flink.configuration.Configuration config, SplitT split) throws IOException - Specified by:
createReaderin interfaceorg.apache.flink.connector.file.src.reader.BulkFormat<T,BatchT> - Throws:
IOException
-
restoreReader
public AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> restoreReader(org.apache.flink.configuration.Configuration config, SplitT split) throws IOException - Specified by:
restoreReaderin interfaceorg.apache.flink.connector.file.src.reader.BulkFormat<T,BatchT> - Throws:
IOException
-
isSplittable
public boolean isSplittable() -
createReaderBatch
public abstract AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT> createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<T, BatchT>> recycler, int batchSize) Creates theAbstractOrcFileInputFormat.OrcReaderBatchstructure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format. -
getProducedType
Gets the type produced by this format.
-