Class NestedColumnReader

java.lang.Object
org.apache.flink.formats.parquet.vector.reader.NestedColumnReader
All Implemented Interfaces:
ColumnReader<org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector>

public class NestedColumnReader extends Object implements ColumnReader<org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector>
This ColumnReader mainly used to read `Group` type in parquet such as `Map`, `Array`, `Row`. The method about how to resolve nested struct mainly refer to : The striping and assembly algorithms from the Dremel paper.

Brief explanation of reading repetition and definition levels: Repetition level equal to 0 means that this is the beginning of a new row. Other value means that we should add data to the current row.

For example, if we have the following data: repetition levels: 0,1,1,0,0,1,[0] (last 0 is implicit, normally will be the end of the page) values: a,b,c,d,e,f will consist of the sets of: (a, b, c), (d), (e, f).

Definition levels contains 3 situations: level = maxDefLevel means value exist and is not null level = maxDefLevel - 1 means value is null level < maxDefLevel - 1 means value doesn't exist For non-nullable (REQUIRED) fields the (level = maxDefLevel - 1) condition means non-existing value as well.

Quick example (maxDefLevel is 2): Read 3 rows out of: repetition levels: 0,1,0,1,1,0,0,... definition levels: 2,1,0,2,1,2,... values: a,b,c,d,e,f,... Resulting buffer: a,n, ,d,n,f that result is (a,n),(d,n),(f) where n means null

  • Constructor Details

    • NestedColumnReader

      public NestedColumnReader(boolean isUtcTimestamp, org.apache.parquet.column.page.PageReadStore pages, ParquetField field)
  • Method Details

    • readToVector

      public void readToVector(int readNumber, org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector vector) throws IOException
      Specified by:
      readToVector in interface ColumnReader<org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector>
      Parameters:
      readNumber - number to read.
      vector - vector to write.
      Throws:
      IOException