Class ParquetColumnarRowInputFormat<SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>

java.lang.Object
org.apache.flink.formats.parquet.ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT>
org.apache.flink.formats.parquet.ParquetColumnarRowInputFormat<SplitT>
All Implemented Interfaces:
Serializable, org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.table.data.RowData>, org.apache.flink.connector.file.src.reader.BulkFormat<org.apache.flink.table.data.RowData,SplitT>, org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat

public class ParquetColumnarRowInputFormat<SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> extends ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT> implements org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
A ParquetVectorizedInputFormat to provide RowData iterator. Using ColumnarRowData to provide a row view of column batch.
See Also:
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.flink.formats.parquet.ParquetVectorizedInputFormat

    ParquetVectorizedInputFormat.ParquetReaderBatch<T>

    Nested classes/interfaces inherited from interface org.apache.flink.connector.file.src.reader.BulkFormat

    org.apache.flink.connector.file.src.reader.BulkFormat.Reader<T extends Object>, org.apache.flink.connector.file.src.reader.BulkFormat.RecordIterator<T extends Object>
  • Field Summary

    Fields inherited from class org.apache.flink.formats.parquet.ParquetVectorizedInputFormat

    hadoopConfig, isUtcTimestamp
  • Constructor Summary

    Constructors
    Constructor
    Description
    ParquetColumnarRowInputFormat(org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType projectedType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)
    Constructor to create parquet format without extra fields.
  • Method Summary

    Modifier and Type
    Method
    Description
    static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
    ParquetColumnarRowInputFormat<SplitT>
    createPartitionedFormat(org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType producedRowType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo, List<String> partitionKeys, org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)
    Create a partitioned ParquetColumnarRowInputFormat, the partition columns can be generated by Path.
    protected ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData>
    createReaderBatch(org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector[] writableVectors, org.apache.flink.table.data.columnar.vector.VectorizedColumnBatch columnarBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData>> recycler)
     
    org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData>
     
    protected int
    numBatchesToCirculate(org.apache.flink.configuration.Configuration config)
     
    org.apache.flink.table.plan.stats.TableStats
    reportStatistics(List<org.apache.flink.core.fs.Path> files, org.apache.flink.table.types.DataType producedDataType)
     

    Methods inherited from class org.apache.flink.formats.parquet.ParquetVectorizedInputFormat

    createReader, isSplittable, restoreReader

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • ParquetColumnarRowInputFormat

      public ParquetColumnarRowInputFormat(org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType projectedType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)
      Constructor to create parquet format without extra fields.
  • Method Details

    • numBatchesToCirculate

      protected int numBatchesToCirculate(org.apache.flink.configuration.Configuration config)
      Overrides:
      numBatchesToCirculate in class ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
    • createReaderBatch

      protected ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData> createReaderBatch(org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector[] writableVectors, org.apache.flink.table.data.columnar.vector.VectorizedColumnBatch columnarBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData>> recycler)
      Specified by:
      createReaderBatch in class ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
      Parameters:
      writableVectors - vectors to be write
      columnarBatch - vectors to be read
      recycler - batch recycler
    • getProducedType

      public org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> getProducedType()
      Specified by:
      getProducedType in interface org.apache.flink.connector.file.src.reader.BulkFormat<org.apache.flink.table.data.RowData,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
      Specified by:
      getProducedType in interface org.apache.flink.api.java.typeutils.ResultTypeQueryable<SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
    • reportStatistics

      public org.apache.flink.table.plan.stats.TableStats reportStatistics(List<org.apache.flink.core.fs.Path> files, org.apache.flink.table.types.DataType producedDataType)
      Specified by:
      reportStatistics in interface org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
    • createPartitionedFormat

      public static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> ParquetColumnarRowInputFormat<SplitT> createPartitionedFormat(org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType producedRowType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo, List<String> partitionKeys, org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)
      Create a partitioned ParquetColumnarRowInputFormat, the partition columns can be generated by Path.