class VectorizedParquetRecordReader extends SpecificParquetRecordReaderBase[AnyRef]
A specialized RecordReader that reads into InternalRows or ColumnarBatches directly using the Parquet column APIs. This is somewhat based on parquet-mr's ColumnReader.
TODO: handle complex types, decimal requiring more than 8 bytes, INT96. Schema mismatch. All of these can be handled efficiently and easily with codegen.
This class can either return InternalRows or ColumnarBatches. With whole stage codegen enabled, this class returns ColumnarBatches which offers significant performance gains. TODO: make this always return ColumnarBatches.
- Alphabetic
- By Inheritance
- VectorizedParquetRecordReader
- SpecificParquetRecordReaderBase
- RecordReader
- Closeable
- AutoCloseable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
def
close(): Unit
- Definition Classes
- VectorizedParquetRecordReader → SpecificParquetRecordReaderBase → RecordReader → Closeable → AutoCloseable
- Annotations
- @Override()
-
def
enableReturningBatches(): Unit
Can be called before any rows are returned to enable returning columnar batches directly.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
getCurrentKey(): Void
- Definition Classes
- SpecificParquetRecordReaderBase → RecordReader
- Annotations
- @Override()
-
def
getCurrentValue(): AnyRef
- Definition Classes
- VectorizedParquetRecordReader → RecordReader
- Annotations
- @Override()
-
def
getProgress(): Float
- Definition Classes
- VectorizedParquetRecordReader → RecordReader
- Annotations
- @Override()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
- def initBatch(partitionColumns: StructType, partitionValues: InternalRow): Unit
-
def
initialize(path: String, columns: List[String]): Unit
Utility API that will read all the data in path.
Utility API that will read all the data in path. This circumvents the need to create Hadoop objects to use this class.
columnscan contain the list of columns to project.- Definition Classes
- VectorizedParquetRecordReader → SpecificParquetRecordReaderBase
- Annotations
- @Override()
-
def
initialize(inputSplit: InputSplit, taskAttemptContext: TaskAttemptContext): Unit
Implementation of RecordReader API.
Implementation of RecordReader API.
- Definition Classes
- VectorizedParquetRecordReader → SpecificParquetRecordReaderBase → RecordReader
- Annotations
- @Override()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
nextBatch(): Boolean
Advances to the next batch of rows.
Advances to the next batch of rows. Returns false if there are no more.
-
def
nextKeyValue(): Boolean
- Definition Classes
- VectorizedParquetRecordReader → RecordReader
- Annotations
- @Override()
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
resultBatch(): ColumnarBatch
Returns the ColumnarBatch object that will be used for all rows returned by this reader.
Returns the ColumnarBatch object that will be used for all rows returned by this reader. This object is reused. Calling this enables the vectorized reader. This should be called before any calls to nextKeyValue/nextBatch.
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated