Package org.apache.flink.orc
Class OrcColumnarRowInputFormat<BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
java.lang.Object
org.apache.flink.orc.AbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT,SplitT>
org.apache.flink.orc.OrcColumnarRowInputFormat<BatchT,SplitT>
- All Implemented Interfaces:
Serializable,org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.table.data.RowData>,org.apache.flink.connector.file.src.reader.BulkFormat<org.apache.flink.table.data.RowData,,SplitT> org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
public class OrcColumnarRowInputFormat<BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
extends AbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT,SplitT>
implements org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
An ORC reader that produces a stream of
ColumnarRowData records.
This class can add extra fields through ColumnBatchFactory, for example, add partition
fields, which can be extracted from path. Therefore, the getProducedType() may be
different and types of extra fields need to be added.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat
AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>, AbstractOrcFileInputFormat.OrcVectorizedReader<T, BatchT> -
Field Summary
Fields inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat
batchSize, conjunctPredicates, hadoopConfigWrapper, schema, selectedFields, shim -
Constructor Summary
ConstructorsConstructorDescriptionOrcColumnarRowInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, ColumnBatchFactory<BatchT, SplitT> batchFactory, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo) -
Method Summary
Modifier and TypeMethodDescriptionstatic <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
OrcColumnarRowInputFormat<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,SplitT> createPartitionedFormat(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType tableType, List<String> partitionKeys, org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, Function<org.apache.flink.table.types.logical.RowType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData>> rowTypeInfoFactory) Create a partitionedOrcColumnarRowInputFormat, the partition columns can be generated by split.AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData,BatchT> createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData, BatchT>> recycler, int batchSize) Creates theAbstractOrcFileInputFormat.OrcReaderBatchstructure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData>Gets the type produced by this format.org.apache.flink.table.plan.stats.TableStatsreportStatistics(List<org.apache.flink.core.fs.Path> files, org.apache.flink.table.types.DataType producedDataType) Methods inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat
createReader, isSplittable, restoreReader
-
Constructor Details
-
OrcColumnarRowInputFormat
public OrcColumnarRowInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, ColumnBatchFactory<BatchT, SplitT> batchFactory, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo)
-
-
Method Details
-
createReaderBatch
public AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData,BatchT> createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData, BatchT>> recycler, int batchSize) Description copied from class:AbstractOrcFileInputFormatCreates theAbstractOrcFileInputFormat.OrcReaderBatchstructure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.- Specified by:
createReaderBatchin classAbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT, SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
-
getProducedType
public org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> getProducedType()Description copied from class:AbstractOrcFileInputFormatGets the type produced by this format.- Specified by:
getProducedTypein interfaceorg.apache.flink.connector.file.src.reader.BulkFormat<BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> - Specified by:
getProducedTypein interfaceorg.apache.flink.api.java.typeutils.ResultTypeQueryable<BatchT>- Specified by:
getProducedTypein classAbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT, SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
-
reportStatistics
public org.apache.flink.table.plan.stats.TableStats reportStatistics(List<org.apache.flink.core.fs.Path> files, org.apache.flink.table.types.DataType producedDataType) - Specified by:
reportStatisticsin interfaceorg.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
-
createPartitionedFormat
public static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> OrcColumnarRowInputFormat<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,SplitT> createPartitionedFormat(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType tableType, List<String> partitionKeys, org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, Function<org.apache.flink.table.types.logical.RowType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData>> rowTypeInfoFactory) Create a partitionedOrcColumnarRowInputFormat, the partition columns can be generated by split.
-