Class ParquetSplitReaderUtil

java.lang.Object
org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil

public class ParquetSplitReaderUtil extends Object
Util for generating ParquetColumnarRowSplitReader.
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    buildFieldsList(List<org.apache.flink.table.types.logical.RowType.RowField> childrens, List<String> fieldNames, org.apache.parquet.io.MessageColumnIO columnIO)
     
    createColumnReader(boolean isUtcTimestamp, org.apache.flink.table.types.logical.LogicalType fieldType, org.apache.parquet.schema.Type type, List<org.apache.parquet.column.ColumnDescriptor> columnDescriptors, org.apache.parquet.column.page.PageReadStore pages, ParquetField field, int depth)
     
    static org.apache.flink.table.data.columnar.vector.ColumnVector
    createVectorFromConstant(org.apache.flink.table.types.logical.LogicalType type, Object value, int batchSize)
     
    static org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector
    createWritableColumnVector(int batchSize, org.apache.flink.table.types.logical.LogicalType fieldType, org.apache.parquet.schema.Type type, List<org.apache.parquet.column.ColumnDescriptor> columnDescriptors, int depth)
     
    genPartColumnarRowReader(boolean utcTimestamp, boolean caseSensitive, org.apache.hadoop.conf.Configuration conf, String[] fullFieldNames, org.apache.flink.table.types.DataType[] fullFieldTypes, Map<String,Object> partitionSpec, int[] selectedFields, int batchSize, org.apache.flink.core.fs.Path path, long splitStart, long splitLength)
    Util for generating partitioned ParquetColumnarRowSplitReader.
    static org.apache.parquet.io.ColumnIO
    getArrayElementColumn(org.apache.parquet.io.ColumnIO columnIO)
     
    static org.apache.parquet.io.GroupColumnIO
    getMapKeyValueColumn(org.apache.parquet.io.GroupColumnIO groupColumnIO)
     
    static org.apache.parquet.io.ColumnIO
    lookupColumnByName(org.apache.parquet.io.GroupColumnIO groupColumnIO, String columnName)
    Parquet's column names are case in sensitive.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • ParquetSplitReaderUtil

      public ParquetSplitReaderUtil()
  • Method Details

    • genPartColumnarRowReader

      public static ParquetColumnarRowSplitReader genPartColumnarRowReader(boolean utcTimestamp, boolean caseSensitive, org.apache.hadoop.conf.Configuration conf, String[] fullFieldNames, org.apache.flink.table.types.DataType[] fullFieldTypes, Map<String,Object> partitionSpec, int[] selectedFields, int batchSize, org.apache.flink.core.fs.Path path, long splitStart, long splitLength) throws IOException
      Util for generating partitioned ParquetColumnarRowSplitReader.
      Throws:
      IOException
    • createVectorFromConstant

      public static org.apache.flink.table.data.columnar.vector.ColumnVector createVectorFromConstant(org.apache.flink.table.types.logical.LogicalType type, Object value, int batchSize)
    • createColumnReader

      public static ColumnReader createColumnReader(boolean isUtcTimestamp, org.apache.flink.table.types.logical.LogicalType fieldType, org.apache.parquet.schema.Type type, List<org.apache.parquet.column.ColumnDescriptor> columnDescriptors, org.apache.parquet.column.page.PageReadStore pages, ParquetField field, int depth) throws IOException
      Throws:
      IOException
    • createWritableColumnVector

      public static org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector createWritableColumnVector(int batchSize, org.apache.flink.table.types.logical.LogicalType fieldType, org.apache.parquet.schema.Type type, List<org.apache.parquet.column.ColumnDescriptor> columnDescriptors, int depth)
    • buildFieldsList

      public static List<ParquetField> buildFieldsList(List<org.apache.flink.table.types.logical.RowType.RowField> childrens, List<String> fieldNames, org.apache.parquet.io.MessageColumnIO columnIO)
    • lookupColumnByName

      public static org.apache.parquet.io.ColumnIO lookupColumnByName(org.apache.parquet.io.GroupColumnIO groupColumnIO, String columnName)
      Parquet's column names are case in sensitive. So when we look up columns we first check for exact match, and if that can not find we look for a case-insensitive match.
    • getMapKeyValueColumn

      public static org.apache.parquet.io.GroupColumnIO getMapKeyValueColumn(org.apache.parquet.io.GroupColumnIO groupColumnIO)
    • getArrayElementColumn

      public static org.apache.parquet.io.ColumnIO getArrayElementColumn(org.apache.parquet.io.ColumnIO columnIO)