Package org.apache.flink.formats.parquet
Class ParquetColumnarRowInputFormat<SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
java.lang.Object
org.apache.flink.formats.parquet.ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT>
org.apache.flink.formats.parquet.ParquetColumnarRowInputFormat<SplitT>
- All Implemented Interfaces:
Serializable,org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.table.data.RowData>,org.apache.flink.connector.file.src.reader.BulkFormat<org.apache.flink.table.data.RowData,,SplitT> org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
public class ParquetColumnarRowInputFormat<SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
extends ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT>
implements org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
A
ParquetVectorizedInputFormat to provide RowData iterator. Using ColumnarRowData to provide a row view of column batch.- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.flink.formats.parquet.ParquetVectorizedInputFormat
ParquetVectorizedInputFormat.ParquetReaderBatch<T> -
Field Summary
Fields inherited from class org.apache.flink.formats.parquet.ParquetVectorizedInputFormat
hadoopConfig, isUtcTimestamp -
Constructor Summary
ConstructorsConstructorDescriptionParquetColumnarRowInputFormat(org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType projectedType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive) Constructor to create parquet format without extra fields. -
Method Summary
Modifier and TypeMethodDescriptionstatic <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
ParquetColumnarRowInputFormat<SplitT>createPartitionedFormat(org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType producedRowType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo, List<String> partitionKeys, org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive) Create a partitionedParquetColumnarRowInputFormat, the partition columns can be generated byPath.protected ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData>createReaderBatch(org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector[] writableVectors, org.apache.flink.table.data.columnar.vector.VectorizedColumnBatch columnarBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData>> recycler) org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData>protected intnumBatchesToCirculate(org.apache.flink.configuration.Configuration config) org.apache.flink.table.plan.stats.TableStatsreportStatistics(List<org.apache.flink.core.fs.Path> files, org.apache.flink.table.types.DataType producedDataType) Methods inherited from class org.apache.flink.formats.parquet.ParquetVectorizedInputFormat
createReader, isSplittable, restoreReader
-
Constructor Details
-
ParquetColumnarRowInputFormat
public ParquetColumnarRowInputFormat(org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType projectedType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive) Constructor to create parquet format without extra fields.
-
-
Method Details
-
numBatchesToCirculate
protected int numBatchesToCirculate(org.apache.flink.configuration.Configuration config) - Overrides:
numBatchesToCirculatein classParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
-
createReaderBatch
protected ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData> createReaderBatch(org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector[] writableVectors, org.apache.flink.table.data.columnar.vector.VectorizedColumnBatch columnarBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData>> recycler) - Specified by:
createReaderBatchin classParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> - Parameters:
writableVectors- vectors to be writecolumnarBatch- vectors to be readrecycler- batch recycler
-
getProducedType
public org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> getProducedType()- Specified by:
getProducedTypein interfaceorg.apache.flink.connector.file.src.reader.BulkFormat<org.apache.flink.table.data.RowData,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> - Specified by:
getProducedTypein interfaceorg.apache.flink.api.java.typeutils.ResultTypeQueryable<SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
-
reportStatistics
public org.apache.flink.table.plan.stats.TableStats reportStatistics(List<org.apache.flink.core.fs.Path> files, org.apache.flink.table.types.DataType producedDataType) - Specified by:
reportStatisticsin interfaceorg.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
-
createPartitionedFormat
public static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> ParquetColumnarRowInputFormat<SplitT> createPartitionedFormat(org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType producedRowType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo, List<String> partitionKeys, org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive) Create a partitionedParquetColumnarRowInputFormat, the partition columns can be generated byPath.
-