Class OrcColumnarRowInputFormat<BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>

java.lang.Object
org.apache.flink.orc.AbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT,SplitT>
org.apache.flink.orc.OrcColumnarRowInputFormat<BatchT,SplitT>
All Implemented Interfaces:
Serializable, org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.table.data.RowData>, org.apache.flink.connector.file.src.reader.BulkFormat<org.apache.flink.table.data.RowData,SplitT>, org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat

public class OrcColumnarRowInputFormat<BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> extends AbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT,SplitT> implements org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
An ORC reader that produces a stream of ColumnarRowData records.

This class can add extra fields through ColumnBatchFactory, for example, add partition fields, which can be extracted from path. Therefore, the getProducedType() may be different and types of extra fields need to be added.

See Also:
  • Constructor Details

    • OrcColumnarRowInputFormat

      public OrcColumnarRowInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, ColumnBatchFactory<BatchT,SplitT> batchFactory, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo)
  • Method Details

    • createReaderBatch

      public AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData,BatchT> createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData,BatchT>> recycler, int batchSize)
      Description copied from class: AbstractOrcFileInputFormat
      Creates the AbstractOrcFileInputFormat.OrcReaderBatch structure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.
      Specified by:
      createReaderBatch in class AbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
    • getProducedType

      public org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> getProducedType()
      Description copied from class: AbstractOrcFileInputFormat
      Gets the type produced by this format.
      Specified by:
      getProducedType in interface org.apache.flink.connector.file.src.reader.BulkFormat<BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
      Specified by:
      getProducedType in interface org.apache.flink.api.java.typeutils.ResultTypeQueryable<BatchT>
      Specified by:
      getProducedType in class AbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
    • reportStatistics

      public org.apache.flink.table.plan.stats.TableStats reportStatistics(List<org.apache.flink.core.fs.Path> files, org.apache.flink.table.types.DataType producedDataType)
      Specified by:
      reportStatistics in interface org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
    • createPartitionedFormat

      public static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> OrcColumnarRowInputFormat<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,SplitT> createPartitionedFormat(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType tableType, List<String> partitionKeys, org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, Function<org.apache.flink.table.types.logical.RowType,org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData>> rowTypeInfoFactory)
      Create a partitioned OrcColumnarRowInputFormat, the partition columns can be generated by split.