Class ParquetSplitReaderUtil
java.lang.Object
org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil
Util for generating
ParquetColumnarRowSplitReader.-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic List<ParquetField>buildFieldsList(List<org.apache.flink.table.types.logical.RowType.RowField> childrens, List<String> fieldNames, org.apache.parquet.io.MessageColumnIO columnIO) static ColumnReadercreateColumnReader(boolean isUtcTimestamp, org.apache.flink.table.types.logical.LogicalType fieldType, org.apache.parquet.schema.Type type, List<org.apache.parquet.column.ColumnDescriptor> columnDescriptors, org.apache.parquet.column.page.PageReadStore pages, ParquetField field, int depth) static org.apache.flink.table.data.columnar.vector.ColumnVectorcreateVectorFromConstant(org.apache.flink.table.types.logical.LogicalType type, Object value, int batchSize) static org.apache.flink.table.data.columnar.vector.writable.WritableColumnVectorcreateWritableColumnVector(int batchSize, org.apache.flink.table.types.logical.LogicalType fieldType, org.apache.parquet.schema.Type type, List<org.apache.parquet.column.ColumnDescriptor> columnDescriptors, int depth) genPartColumnarRowReader(boolean utcTimestamp, boolean caseSensitive, org.apache.hadoop.conf.Configuration conf, String[] fullFieldNames, org.apache.flink.table.types.DataType[] fullFieldTypes, Map<String, Object> partitionSpec, int[] selectedFields, int batchSize, org.apache.flink.core.fs.Path path, long splitStart, long splitLength) Util for generating partitionedParquetColumnarRowSplitReader.static org.apache.parquet.io.ColumnIOgetArrayElementColumn(org.apache.parquet.io.ColumnIO columnIO) static org.apache.parquet.io.GroupColumnIOgetMapKeyValueColumn(org.apache.parquet.io.GroupColumnIO groupColumnIO) static org.apache.parquet.io.ColumnIOlookupColumnByName(org.apache.parquet.io.GroupColumnIO groupColumnIO, String columnName) Parquet's column names are case in sensitive.
-
Constructor Details
-
ParquetSplitReaderUtil
public ParquetSplitReaderUtil()
-
-
Method Details
-
genPartColumnarRowReader
public static ParquetColumnarRowSplitReader genPartColumnarRowReader(boolean utcTimestamp, boolean caseSensitive, org.apache.hadoop.conf.Configuration conf, String[] fullFieldNames, org.apache.flink.table.types.DataType[] fullFieldTypes, Map<String, Object> partitionSpec, int[] selectedFields, int batchSize, org.apache.flink.core.fs.Path path, long splitStart, long splitLength) throws IOExceptionUtil for generating partitionedParquetColumnarRowSplitReader.- Throws:
IOException
-
createVectorFromConstant
public static org.apache.flink.table.data.columnar.vector.ColumnVector createVectorFromConstant(org.apache.flink.table.types.logical.LogicalType type, Object value, int batchSize) -
createColumnReader
public static ColumnReader createColumnReader(boolean isUtcTimestamp, org.apache.flink.table.types.logical.LogicalType fieldType, org.apache.parquet.schema.Type type, List<org.apache.parquet.column.ColumnDescriptor> columnDescriptors, org.apache.parquet.column.page.PageReadStore pages, ParquetField field, int depth) throws IOException - Throws:
IOException
-
createWritableColumnVector
public static org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector createWritableColumnVector(int batchSize, org.apache.flink.table.types.logical.LogicalType fieldType, org.apache.parquet.schema.Type type, List<org.apache.parquet.column.ColumnDescriptor> columnDescriptors, int depth) -
buildFieldsList
public static List<ParquetField> buildFieldsList(List<org.apache.flink.table.types.logical.RowType.RowField> childrens, List<String> fieldNames, org.apache.parquet.io.MessageColumnIO columnIO) -
lookupColumnByName
public static org.apache.parquet.io.ColumnIO lookupColumnByName(org.apache.parquet.io.GroupColumnIO groupColumnIO, String columnName) Parquet's column names are case in sensitive. So when we look up columns we first check for exact match, and if that can not find we look for a case-insensitive match. -
getMapKeyValueColumn
public static org.apache.parquet.io.GroupColumnIO getMapKeyValueColumn(org.apache.parquet.io.GroupColumnIO groupColumnIO) -
getArrayElementColumn
public static org.apache.parquet.io.ColumnIO getArrayElementColumn(org.apache.parquet.io.ColumnIO columnIO)
-