Class AbstractColumnReader<VECTOR extends org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector>

java.lang.Object
org.apache.flink.formats.parquet.vector.reader.AbstractColumnReader<VECTOR>
All Implemented Interfaces:
ColumnReader<VECTOR>
Direct Known Subclasses:
BooleanColumnReader, ByteColumnReader, BytesColumnReader, DoubleColumnReader, FixedLenBytesColumnReader, FloatColumnReader, IntColumnReader, LongColumnReader, ShortColumnReader, TimestampColumnReader

public abstract class AbstractColumnReader<VECTOR extends org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector> extends Object implements ColumnReader<VECTOR>
Abstract ColumnReader. See ColumnReaderImpl, part of the code is referred from Apache Spark and Apache Parquet.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected final org.apache.parquet.column.ColumnDescriptor
     
    protected final org.apache.parquet.column.Dictionary
    The dictionary, if this column has dictionary encoding.
    protected final int
    Maximum definition level for this column.
    protected org.apache.flink.formats.parquet.vector.reader.RunLengthDecoder
    Run length decoder for data and dictionary.
  • Constructor Summary

    Constructors
    Constructor
    Description
    AbstractColumnReader(org.apache.parquet.column.ColumnDescriptor descriptor, org.apache.parquet.column.page.PageReader pageReader)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    protected void
    After read a page, we may need some initialization.
    protected void
    checkTypeName(org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName expectedName)
     
    protected abstract void
    readBatch(int rowId, int num, VECTOR column)
    Read batch from runLenDecoder and dataInputStream.
    protected abstract void
    readBatchFromDictionaryIds(int rowId, int num, VECTOR column, org.apache.flink.table.data.columnar.vector.writable.WritableIntVector dictionaryIds)
    Decode dictionary ids to data.
    final void
    readToVector(int readNumber, VECTOR vector)
    Reads `total` values from this columnReader into column.
    protected boolean
    Support lazy dictionary ids decode.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • dictionary

      protected final org.apache.parquet.column.Dictionary dictionary
      The dictionary, if this column has dictionary encoding.
    • maxDefLevel

      protected final int maxDefLevel
      Maximum definition level for this column.
    • descriptor

      protected final org.apache.parquet.column.ColumnDescriptor descriptor
    • runLenDecoder

      protected org.apache.flink.formats.parquet.vector.reader.RunLengthDecoder runLenDecoder
      Run length decoder for data and dictionary.
  • Constructor Details

    • AbstractColumnReader

      public AbstractColumnReader(org.apache.parquet.column.ColumnDescriptor descriptor, org.apache.parquet.column.page.PageReader pageReader) throws IOException
      Throws:
      IOException
  • Method Details

    • checkTypeName

      protected void checkTypeName(org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName expectedName)
    • readToVector

      public final void readToVector(int readNumber, VECTOR vector) throws IOException
      Reads `total` values from this columnReader into column.
      Specified by:
      readToVector in interface ColumnReader<VECTOR extends org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector>
      Parameters:
      readNumber - number to read.
      vector - vector to write.
      Throws:
      IOException
    • afterReadPage

      protected void afterReadPage()
      After read a page, we may need some initialization.
    • supportLazyDecode

      protected boolean supportLazyDecode()
      Support lazy dictionary ids decode. See more in ParquetDictionary. If return false, we will decode all the data first.
    • readBatch

      protected abstract void readBatch(int rowId, int num, VECTOR column)
      Read batch from runLenDecoder and dataInputStream.
    • readBatchFromDictionaryIds

      protected abstract void readBatchFromDictionaryIds(int rowId, int num, VECTOR column, org.apache.flink.table.data.columnar.vector.writable.WritableIntVector dictionaryIds)
      Decode dictionary ids to data. From runLenDecoder and dictionaryIdsDecoder.