package rapids
- Alphabetic
- Public
- All
Type Members
- case class AvroBatchContext(origChunkedBlocks: LinkedHashMap[Path, ArrayBuffer[DataBlockBase]], schema: SchemaBase, mergedHeader: Header) extends BatchContext with Product with Serializable
-
case class
AvroBlockMeta(header: Header, headerSize: Long, blocks: Seq[BlockInfo]) extends Product with Serializable
Avro block meta info
Avro block meta info
- header
the header of avro file
- blocks
the total block info of avro file
-
case class
AvroDataBlock(blockInfo: BlockInfo) extends DataBlockBase with Product with Serializable
avro BlockInfo wrapper
-
case class
AvroExtraInfo() extends ExtraInfo with Product with Serializable
Extra information
-
case class
AvroFileFilterHandler(hadoopConf: Configuration, options: AvroOptions) extends Arm with Logging with Product with Serializable
A tool to filter Avro blocks
- class AvroProviderImpl extends AvroProvider
-
case class
AvroSchemaWrapper(schema: Schema) extends SchemaBase with Product with Serializable
avro schema wrapper
- case class AvroSingleDataBlockInfo(filePath: Path, dataBlock: AvroDataBlock, partitionValues: InternalRow, schema: AvroSchemaWrapper, extraInfo: AvroExtraInfo) extends SingleDataBlockInfo with Product with Serializable
- trait BasePad extends TernaryExpression with GpuTernaryExpression with ImplicitCastInputTypes with NullIntolerant
-
class
BasicColumnarWriteJobStatsTracker extends ColumnarWriteJobStatsTracker
Simple ColumnarWriteJobStatsTracker implementation that's serializable, capable of instantiating BasicColumnarWriteTaskStatsTracker on executors and processing the
BasicColumnarWriteTaskStatsthey produce by aggregating the metrics and posting them as DriverMetricUpdates. -
case class
BasicColumnarWriteTaskStats(numPartitions: Int, numFiles: Int, numBytes: Long, numRows: Long) extends WriteTaskStats with Product with Serializable
Simple metrics collected during an instance of GpuFileFormatDataWriter.
Simple metrics collected during an instance of GpuFileFormatDataWriter. These were first introduced in https://github.com/apache/spark/pull/18159 (SPARK-20703).
-
class
BasicColumnarWriteTaskStatsTracker extends ColumnarWriteTaskStatsTracker with Logging
Simple metrics collected during an instance of GpuFileFormatDataWriter.
Simple metrics collected during an instance of GpuFileFormatDataWriter. This is the columnar version of
org.apache.spark.sql.execution.datasources.BasicWriteTaskStatsTracker. -
trait
ColumnarWriteJobStatsTracker extends Serializable
A class implementing this trait is basically a collection of parameters that are necessary for instantiating a (derived type of) ColumnarWriteTaskStatsTracker on all executors and then process the statistics produced by them (e.g.
A class implementing this trait is basically a collection of parameters that are necessary for instantiating a (derived type of) ColumnarWriteTaskStatsTracker on all executors and then process the statistics produced by them (e.g. save them to memory/disk, issue warnings, etc). It is therefore important that such an objects is
Serializable, as it will be sent from the driver to all executors. -
trait
ColumnarWriteTaskStatsTracker extends AnyRef
A trait for classes that are capable of collecting statistics on columnar data that's being processed by a single write task in GpuFileFormatDataWriter - i.e.
A trait for classes that are capable of collecting statistics on columnar data that's being processed by a single write task in GpuFileFormatDataWriter - i.e. there should be one instance per executor.
newPartition event is only triggered if the relation to be written out is partitioned.
- trait CpuToGpuAggregateBufferConverter extends AnyRef
- trait CpuToGpuBufferTransition extends UnaryExpression with ShimUnaryExpression with CodegenFallback
- class CpuToGpuCollectBufferConverter extends CpuToGpuAggregateBufferConverter
- case class CpuToGpuCollectBufferTransition(child: Expression, elementType: DataType) extends UnaryExpression with CpuToGpuBufferTransition with Product with Serializable
- trait CudfAggregate extends AnyRef
- abstract class CudfBinaryArithmetic extends CudfBinaryOperator with NullIntolerant
- abstract class CudfBinaryComparison extends CudfBinaryOperator with Predicate
- abstract class CudfBinaryMathExpression extends BinaryExpression with CudfBinaryExpression with Serializable with ImplicitCastInputTypes
- abstract class CudfBinaryPredicateWithSideEffect extends CudfBinaryOperator with Predicate
- class CudfCollectList extends CudfAggregate
- class CudfCollectSet extends CudfAggregate
- class CudfCount extends CudfAggregate
- class CudfM2 extends CudfAggregate
- class CudfMax extends CudfAggregate
-
class
CudfMean extends CudfAggregate
This class is only used by the M2 class aggregates, do not confuse this with GpuAverage.
This class is only used by the M2 class aggregates, do not confuse this with GpuAverage. In the future, this aggregate class should be removed and the mean values should be generated in the output of libcudf's M2 aggregate.
- class CudfMergeLists extends CudfAggregate
- class CudfMergeM2 extends CudfAggregate
- class CudfMergeSets extends CudfAggregate
- class CudfMin extends CudfAggregate
- class CudfNthLikeAggregate extends CudfAggregate
- class CudfSum extends CudfAggregate with Arm
- abstract class CudfUnaryMathExpression extends GpuUnaryMathExpression with CudfUnaryExpression
- case class GpuAbs(child: Expression, failOnError: Boolean) extends GpuUnaryExpression with CudfUnaryExpression with ExpectsInputTypes with NullIntolerant with Product with Serializable
- case class GpuAcos(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuAcoshCompat(child: Expression) extends GpuUnaryMathExpression with Product with Serializable
- case class GpuAcoshImproved(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuAdd(left: Expression, right: Expression, failOnError: Boolean) extends CudfBinaryArithmetic with Product with Serializable
- case class GpuAggregateExpression(origAggregateFunction: GpuAggregateFunction, mode: AggregateMode, isDistinct: Boolean, filter: Option[Expression], resultId: ExprId) extends Expression with GpuExpression with ShimExpression with GpuUnevaluable with Product with Serializable
-
trait
GpuAggregateFunction extends Expression with GpuExpression with ShimExpression with GpuUnevaluable
Trait that all aggregate functions implement.
Trait that all aggregate functions implement.
Aggregates start with some input from the child plan or from another aggregate (or from itself if the aggregate is merging several batches).
In general terms an aggregate function can be in one of two modes of operation: update or merge. Either the function is aggregating raw input, or it is merging previously aggregated data. Normally, Spark breaks up the processing of the aggregate in two exec nodes (a partial aggregate and a final), and the are separated by a shuffle boundary. That is not true for all aggregates, especially when looking at other flavors of Spark. What doesn't change is the core function of updating or merging. Note that an aggregate can merge right after an update is performed, as we have cases where input batches are update-aggregated and then a bigger batch is built by merging together those pre-aggregated inputs.
Aggregates have an interface to Spark and that is defined by
aggBufferAttributes. This collection of attributes must match the Spark equivalent of the aggregate, so that if half of the aggregate (update or merge) executes on the CPU, we can be compatible. The GpuAggregateFunction adds special steps to ensure that it can produce (and consume) batches in the shape ofaggBufferAttributes.The general transitions that are implemented in the aggregate function are as follows:
1)
inputProjection->updateAggregates:inputProjectioncreates a sequence of values that are operated on by theupdateAggregates. The length ofinputProjectionmust be the same asupdateAggregates, andupdateAggregates(cuDF aggregates) should be able to work with the product of theinputProjection(i.e. types are compatible)2)
updateAggregates->postUpdate: after the cuDF update aggregate, a post process step can (optionally) be performed. ThepostUpdatetakes the output ofupdateAggregatethat must match the order of columns and types as specified inaggBufferAttributes.3)
postUpdate->preMerge: preMerge prepares batches before going into themergeAggregate. ThepreMergestep binds toaggBufferAttributes, so it can be used to transform Spark compatible batch to a batch that the cuDF merge aggregate expects. Its input has the same shape as that produced bypostUpdate.4)
mergeAggregates->postMerge: postMerge optionally transforms the output of the cuDF merge aggregate in two situations: 1 - The step is used to match theaggBufferAttributesreferences for partial aggregates where each partially aggregated batch is getting merged withAggHelper(merge=true)2 - In a final aggregate where the merged batches are transformed to whatevaluateExpressionexpects. For simple aggregates like sum or count,evaluateExpressionis justaggBufferAttributes, but for more complex aggregates, it is an expression (see GpuAverage and GpuM2 subclasses) that relies on the merge step producing a columns in the shape ofaggBufferAttributes. - case class GpuAnd(left: Expression, right: Expression) extends CudfBinaryPredicateWithSideEffect with Product with Serializable
- trait GpuArrayBinaryLike extends Expression with GpuComplexTypeMergingExpression with NullIntolerant
-
case class
GpuArrayContains(left: Expression, right: Expression) extends BinaryExpression with GpuBinaryExpression with NullIntolerant with Product with Serializable
Checks if the array (left) has the element (right)
- case class GpuArrayExcept(left: Expression, right: Expression) extends Expression with GpuArrayBinaryLike with ExpectsInputTypes with Product with Serializable
- case class GpuArrayIntersect(left: Expression, right: Expression) extends Expression with GpuArrayBinaryLike with ExpectsInputTypes with Product with Serializable
- case class GpuArrayMax(child: Expression) extends GpuUnaryExpression with ImplicitCastInputTypes with Product with Serializable
- case class GpuArrayMin(child: Expression) extends GpuUnaryExpression with ImplicitCastInputTypes with Product with Serializable
- case class GpuArrayRepeat(left: Expression, right: Expression) extends BinaryExpression with GpuBinaryExpression with Product with Serializable
- case class GpuArrayUnion(left: Expression, right: Expression) extends Expression with GpuArrayBinaryLike with ExpectsInputTypes with Product with Serializable
- case class GpuArraysOverlap(left: Expression, right: Expression) extends BinaryExpression with GpuBinaryExpression with ExpectsInputTypes with NullIntolerant with Product with Serializable
- case class GpuArraysZip(children: Seq[Expression]) extends Expression with GpuExpression with ShimExpression with ExpectsInputTypes with Product with Serializable
- case class GpuAsin(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuAsinhCompat(child: Expression) extends GpuUnaryMathExpression with Product with Serializable
- case class GpuAsinhImproved(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
-
case class
GpuAssembleSumChunks(chunkAttrs: Seq[AttributeReference], dataType: DecimalType, nullOnOverflow: Boolean) extends Expression with GpuExpression with ShimExpression with Product with Serializable
Reassembles a 128-bit value from four separate 64-bit sum results
Reassembles a 128-bit value from four separate 64-bit sum results
- chunkAttrs
attributes for the four 64-bit sum chunks ordered from least significant to most significant
- dataType
output type of the reconstructed 128-bit value
- nullOnOverflow
whether to produce null on overflows
- case class GpuAtan(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuAtanh(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- abstract class GpuAverage extends Expression with GpuAggregateFunction with GpuReplaceWindowFunction with Serializable
-
case class
GpuAvroMultiFilePartitionReaderFactory(sqlConf: SQLConf, rapidsConf: RapidsConf, broadcastedConf: Broadcast[SerializableConfiguration], dataSchema: StructType, readDataSchema: StructType, partitionSchema: StructType, options: AvroOptions, metrics: Map[String, GpuMetric], filters: Array[Filter], queryUsesInputFile: Boolean) extends MultiFilePartitionReaderFactoryBase with Product with Serializable
The multi-file partition reader factory for cloud or coalescing reading of avro file format.
-
class
GpuAvroPartitionReader extends FilePartitionReaderBase with GpuAvroReaderBase
A PartitionReader that reads an AVRO file split on the GPU.
-
case class
GpuAvroPartitionReaderFactory(sqlConf: SQLConf, rapidsConf: RapidsConf, broadcastedConf: Broadcast[SerializableConfiguration], dataSchema: StructType, readDataSchema: StructType, partitionSchema: StructType, avroOptions: AvroOptions, metrics: Map[String, GpuMetric], params: Map[String, String]) extends ShimFilePartitionReaderFactory with Logging with Product with Serializable
Avro partition reader factory to build columnar reader
-
trait
GpuAvroReaderBase extends Arm with Logging
A trait collecting common methods across the 3 kinds of avro readers
- case class GpuAvroScan(sparkSession: SparkSession, fileIndex: PartitioningAwareFileIndex, dataSchema: StructType, readDataSchema: StructType, readPartitionSchema: StructType, options: CaseInsensitiveStringMap, pushedFilters: Array[Filter], rapidsConf: RapidsConf, partitionFilters: Seq[Expression] = Seq.empty, dataFilters: Seq[Expression] = Seq.empty, queryUsesInputFile: Boolean = false) extends FileScan with ScanWithMetrics with Product with Serializable
- case class GpuBRound(child: Expression, scale: Expression, outputType: DataType) extends GpuRoundBase with Product with Serializable
- case class GpuBasicAverage(child: Expression, dt: DataType) extends GpuAverage with Product with Serializable
- case class GpuBasicDecimalAverage(child: Expression, dt: DecimalType) extends GpuDecimalAverage with Product with Serializable
-
case class
GpuBasicDecimalSum(child: Expression, dt: DecimalType, failOnErrorOverride: Boolean) extends GpuDecimalSum with Product with Serializable
Sum aggregations for decimals up to and including DECIMAL64
-
case class
GpuBasicSum(child: Expression, resultType: DataType, failOnErrorOverride: Boolean) extends GpuSum with Product with Serializable
Sum aggregation for non-decimal types
- case class GpuBitLength(child: Expression) extends GpuUnaryExpression with ExpectsInputTypes with Product with Serializable
- case class GpuBitwiseAnd(left: Expression, right: Expression) extends CudfBinaryArithmetic with Product with Serializable
- case class GpuBitwiseNot(child: Expression) extends GpuUnaryExpression with CudfUnaryExpression with ExpectsInputTypes with Product with Serializable
- case class GpuBitwiseOr(left: Expression, right: Expression) extends CudfBinaryArithmetic with Product with Serializable
- case class GpuBitwiseXor(left: Expression, right: Expression) extends CudfBinaryArithmetic with Product with Serializable
- class GpuCartesianPartition extends Partition
- case class GpuCartesianProductExec(left: SparkPlan, right: SparkPlan, condition: Option[Expression], targetSizeBytes: Long) extends SparkPlan with ShimBinaryExecNode with GpuExec with Product with Serializable
- class GpuCartesianRDD extends RDD[ColumnarBatch] with Serializable with Arm
- case class GpuCbrt(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuCeil(child: Expression, outputType: DataType) extends CudfUnaryMathExpression with Product with Serializable
-
case class
GpuCheckOverflowAfterSum(data: Expression, isEmpty: Expression, dataType: DecimalType, nullOnOverflow: Boolean) extends Expression with GpuExpression with ShimExpression with Product with Serializable
This is equivalent to what Spark does after a sum to check for overflow
If(isEmpty, Literal.create(null, resultType), CheckOverflowInSum(sum, d, !SQLConf.get.ansiEnabled))This is equivalent to what Spark does after a sum to check for overflow
If(isEmpty, Literal.create(null, resultType), CheckOverflowInSum(sum, d, !SQLConf.get.ansiEnabled))But we are renaming it to avoid confusion with the overflow detection we do as a part of sum itself that takes the place of the overflow checking that happens with add.
- trait GpuCollectBase extends Expression with GpuAggregateFunction with GpuDeterministicFirstLastCollectShim with GpuAggregateWindowFunction
-
case class
GpuCollectList(child: Expression, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0) extends Expression with GpuCollectBase with Product with Serializable
Collects and returns a list of non-unique elements.
Collects and returns a list of non-unique elements.
The two 'offset' parameters are not used by GPU version, but are here for the compatibility with the CPU version and automated checks.
-
case class
GpuCollectSet(child: Expression, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0) extends Expression with GpuCollectBase with Product with Serializable
Collects and returns a set of unique elements.
Collects and returns a set of unique elements.
The two 'offset' parameters are not used by GPU version, but are here for the compatibility with the CPU version and automated checks.
- case class GpuConcat(children: Seq[Expression]) extends Expression with GpuComplexTypeMergingExpression with Product with Serializable
- case class GpuConcatWs(children: Seq[Expression]) extends Expression with GpuExpression with ShimExpression with ImplicitCastInputTypes with Product with Serializable
- case class GpuContains(left: Expression, right: Expression) extends BinaryExpression with GpuBinaryExpression with Predicate with ImplicitCastInputTypes with NullIntolerant with Product with Serializable
- case class GpuCos(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuCosh(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuCot(child: Expression) extends GpuUnaryMathExpression with Product with Serializable
- case class GpuCount(children: Seq[Expression]) extends Expression with GpuAggregateFunction with GpuBatchedRunningWindowWithFixer with GpuAggregateWindowFunction with GpuRunningWindowFunction with Product with Serializable
- case class GpuCreateArray(children: Seq[Expression], useStringTypeWhenEmpty: Boolean) extends Expression with GpuExpression with ShimExpression with Product with Serializable
- case class GpuCreateDataSourceTableAsSelectCommand(table: CatalogTable, mode: SaveMode, query: LogicalPlan, outputColumnNames: Seq[String], origProvider: Class[_], gpuFileFormat: ColumnarFileFormat, useStableSort: Boolean) extends LogicalPlan with GpuDataWritingCommand with Product with Serializable
- case class GpuCreateMap(children: Seq[Expression], useStringTypeWhenEmpty: Boolean) extends Expression with GpuExpression with ShimExpression with Product with Serializable
- case class GpuCreateNamedStruct(children: Seq[Expression]) extends Expression with GpuExpression with ShimExpression with Product with Serializable
-
case class
GpuDataSource(sparkSession: SparkSession, className: String, paths: Seq[String] = Nil, userSpecifiedSchema: Option[StructType] = None, partitionColumns: Seq[String] = Seq.empty, bucketSpec: Option[BucketSpec] = None, options: Map[String, String] = Map.empty, catalogTable: Option[CatalogTable] = None, origProvider: Class[_], gpuFileFormat: ColumnarFileFormat) extends Logging with Product with Serializable
A truncated version of Spark DataSource that converts to use the GPU version of InsertIntoHadoopFsRelationCommand for FileFormats we support.
A truncated version of Spark DataSource that converts to use the GPU version of InsertIntoHadoopFsRelationCommand for FileFormats we support. This does not support DataSource V2 writing at this point because at the time of copying, it did not.
-
trait
GpuDataSourceScanExec extends SparkPlan with LeafExecNode with GpuExec
GPU implementation of Spark's
DataSourceScanExec - case class GpuDateAdd(startDate: Expression, days: Expression) extends BinaryExpression with GpuDateMathBase with Product with Serializable
- case class GpuDateAddInterval(start: Expression, interval: Expression, timeZoneId: Option[String] = None, ansiEnabled: Boolean = SQLConf.get.ansiEnabled) extends GpuTimeMath with Product with Serializable
- case class GpuDateDiff(endDate: Expression, startDate: Expression) extends BinaryExpression with GpuBinaryExpression with ImplicitCastInputTypes with Product with Serializable
- case class GpuDateFormatClass(timestamp: Expression, format: Expression, strfFormat: String, timeZoneId: Option[String] = None) extends BinaryExpression with GpuBinaryExpression with TimeZoneAwareExpression with ImplicitCastInputTypes with Product with Serializable
- trait GpuDateMathBase extends BinaryExpression with GpuBinaryExpression with ExpectsInputTypes
- case class GpuDateSub(startDate: Expression, days: Expression) extends BinaryExpression with GpuDateMathBase with Product with Serializable
- trait GpuDateUnaryExpression extends GpuUnaryExpression with ImplicitCastInputTypes
- case class GpuDayOfMonth(child: Expression) extends GpuUnaryExpression with GpuDateUnaryExpression with Product with Serializable
- case class GpuDayOfWeek(child: Expression) extends GpuUnaryExpression with GpuDateUnaryExpression with Product with Serializable
- case class GpuDayOfYear(child: Expression) extends GpuUnaryExpression with GpuDateUnaryExpression with Product with Serializable
-
case class
GpuDecimal128Average(child: Expression, dt: DecimalType) extends GpuDecimalAverage with Product with Serializable
Average aggregations for DECIMAL128.
Average aggregations for DECIMAL128.
To avoid the significantly slower sort-based aggregations in cudf for DECIMAL128 columns, the incoming DECIMAL128 values are split into four 32-bit chunks which are summed separately into 64-bit intermediate results and then recombined into a 128-bit result with overflow checking. See GpuDecimal128Sum for more details.
-
case class
GpuDecimal128Sum(child: Expression, dt: DecimalType, failOnErrorOverride: Boolean, forceWindowSumToNotBeReplaced: Boolean) extends GpuDecimalSum with GpuReplaceWindowFunction with Product with Serializable
Sum aggregations for DECIMAL128.
Sum aggregations for DECIMAL128.
The sum aggregation is performed by splitting the original 128-bit values into 32-bit "chunks" and summing those. The chunking accomplishes two things. First, it helps avoid cudf resorting to a much slower aggregation since currently DECIMAL128 sums are only implemented for sort-based aggregations. Second, chunking allows detection of overflows.
The chunked approach to sum aggregation works as follows. The 128-bit value is split into its four 32-bit chunks, with the most significant chunk being an INT32 and the remaining three chunks being UINT32. When these are sum aggregated, cudf will implicitly upscale the accumulated result to a 64-bit value. Since cudf only allows up to 2**31 rows to be aggregated at a time, the "extra" upper 32-bits of the upscaled 64-bit accumulation values will be enough to hold the worst-case "carry" bits from summing each 32-bit chunk.
After the cudf aggregation has completed, the four 64-bit chunks are reassembled into a 128-bit value. The lowest 32-bits of the least significant 64-bit chunk are used directly as the lowest 32-bits of the final value, and the remaining 32-bits are added to the next most significant 64-bit chunk. The lowest 32-bits of that chunk then become the next 32-bits of the 128-bit value and the remaining 32-bits are added to the next 64-bit chunk, and so on. Finally after the 128-bit value is constructed, the remaining "carry" bits of the most significant chunk after reconstruction are checked against the sign bit of the 128-bit result to see if there was an overflow.
- abstract class GpuDecimalAverage extends GpuAverage
-
case class
GpuDecimalDivide(left: Expression, right: Expression, dataType: DecimalType, failOnError: Boolean = SQLConf.get.ansiEnabled) extends Expression with ShimExpression with GpuExpression with Product with Serializable
A version of Divide specifically for DecimalType that does not force the left and right to be the same type.
A version of Divide specifically for DecimalType that does not force the left and right to be the same type. This lets us calculate the correct result on a wider range of values without the need for unbounded precision in the processing.
- case class GpuDecimalMultiply(left: Expression, right: Expression, dataType: DecimalType, needsExtraOverflowChecks: Boolean = false, failOnError: Boolean = SQLConf.get.ansiEnabled) extends Expression with ShimExpression with GpuExpression with Product with Serializable
- abstract class GpuDecimalSum extends GpuSum
-
case class
GpuDecimalSumHighDigits(input: Expression, originalInputType: DecimalType) extends Expression with GpuExpression with ShimExpression with Product with Serializable
This extracts the highest digits from a Decimal value as a part of doing a SUM.
- trait GpuDivModLike extends CudfBinaryArithmetic
- case class GpuDivide(left: Expression, right: Expression, failOnErrorOverride: Boolean = SQLConf.get.ansiEnabled) extends CudfBinaryArithmetic with GpuDivModLike with Product with Serializable
-
class
GpuDynamicPartitionDataWriter extends GpuFileFormatDataWriter
Writes data to using dynamic partition writes, meaning this single function can write to multiple directories (partitions) or files (bucketing).
- case class GpuElementAt(left: Expression, right: Expression, failOnError: Boolean) extends BinaryExpression with GpuBinaryExpression with ExpectsInputTypes with Product with Serializable
-
class
GpuEmptyDirectoryDataWriter extends GpuFileFormatDataWriter
GPU data writer for empty partitions
- case class GpuEndsWith(left: Expression, right: Expression) extends BinaryExpression with GpuBinaryExpression with Predicate with ImplicitCastInputTypes with NullIntolerant with Product with Serializable
- case class GpuEqualNullSafe(left: Expression, right: Expression) extends CudfBinaryComparison with NullIntolerant with Product with Serializable
-
case class
GpuEqualTo(left: Expression, right: Expression) extends CudfBinaryComparison with NullIntolerant with Product with Serializable
The table below shows how the result is calculated for Equal-to.
The table below shows how the result is calculated for Equal-to. To make calculation easier we are leveraging the fact that the cudf-result(r) always returns false. So that result is used in place of false when needed.
Return (lhs.nan && rhs.nan) || result[i]
+-------------+------------+------------------+---------------+----+ | lhs.isNan()| rhs.isNan | cudf-result(r) | final-result | eq | +-------------+------------+------------------+---------------+----+ | t | f | f | r | f | | f | t | f | r | f | | t | t | f | t | t | | f | f | r | r | na | +-------------+------------+------------------+---------------+----+
-
case class
GpuEqualToNoNans(left: Expression, right: Expression) extends CudfBinaryComparison with NullIntolerant with Product with Serializable
This implementation leverages the default implementation of equal-to on the GPU to perform the binary equals comparison.
This implementation leverages the default implementation of equal-to on the GPU to perform the binary equals comparison. This is used for operations like PivotFirst, where NaN != NaN (unlike most other cases) when pivoting on a float or double column.
- case class GpuExp(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuExpm1(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
-
case class
GpuExtractChunk32(data: Expression, chunkIdx: Int, replaceNullsWithZero: Boolean) extends Expression with GpuExpression with ShimExpression with Product with Serializable
Extracts a 32-bit chunk from a 128-bit value
Extracts a 32-bit chunk from a 128-bit value
- data
expression producing 128-bit values
- chunkIdx
index of chunk to extract (0-3)
- replaceNullsWithZero
whether to replace nulls with zero
-
abstract
class
GpuFileFormatDataWriter extends DataWriter[ColumnarBatch]
Abstract class for writing out data in a single Spark task using the GPU.
Abstract class for writing out data in a single Spark task using the GPU. This is the GPU version of
org.apache.spark.sql.execution.datasources.FileFormatDataWriter. -
case class
GpuFileSourceScanExec(relation: HadoopFsRelation, output: Seq[Attribute], requiredSchema: StructType, partitionFilters: Seq[Expression], optionalBucketSet: Option[BitSet], optionalNumCoalescedBuckets: Option[Int], dataFilters: Seq[Expression], tableIdentifier: Option[TableIdentifier], disableBucketedScan: Boolean = false, queryUsesInputFile: Boolean = false)(rapidsConf: RapidsConf) extends SparkPlan with GpuDataSourceScanExec with GpuExec with Product with Serializable
GPU version of Spark's
FileSourceScanExecGPU version of Spark's
FileSourceScanExec- relation
The file-based relation to scan.
- output
Output attributes of the scan, including data attributes and partition attributes.
- requiredSchema
Required schema of the underlying relation, excluding partition columns.
- partitionFilters
Predicates to use for partition pruning.
- optionalBucketSet
Bucket ids for bucket pruning.
- optionalNumCoalescedBuckets
Number of coalesced buckets.
- dataFilters
Filters on non-partition columns.
- tableIdentifier
identifier for the table in the metastore.
- disableBucketedScan
Disable bucketed scan based on physical query plan.
- queryUsesInputFile
This is a parameter to easily allow turning it off in GpuTransitionOverrides if InputFileName, InputFileBlockStart, or InputFileBlockLength are used
- rapidsConf
Rapids conf
- case class GpuFirst(child: Expression, ignoreNulls: Boolean) extends Expression with GpuAggregateFunction with GpuAggregateWindowFunction with GpuDeterministicFirstLastCollectShim with ImplicitCastInputTypes with Serializable with Product
- case class GpuFloor(child: Expression, outputType: DataType) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuFromUnixTime(sec: Expression, format: Expression, strfFormat: String, timeZoneId: Option[String] = None) extends BinaryExpression with GpuBinaryExpression with TimeZoneAwareExpression with ImplicitCastInputTypes with Product with Serializable
-
case class
GpuGetArrayItem(child: Expression, ordinal: Expression, failOnError: Boolean) extends BinaryExpression with GpuBinaryExpression with ExpectsInputTypes with ExtractValue with Product with Serializable
Returns the field at
ordinalin the Arraychild.Returns the field at
ordinalin the Arraychild.We need to do type checking here as
ordinalexpression maybe unresolved. -
case class
GpuGetArrayStructFields(child: Expression, field: StructField, ordinal: Int, numFields: Int, containsNull: Boolean) extends GpuUnaryExpression with ExtractValue with NullIntolerant with Product with Serializable
For a child whose data type is an array of structs, extracts the
ordinal-th fields of all array elements, and returns them as a new array.For a child whose data type is an array of structs, extracts the
ordinal-th fields of all array elements, and returns them as a new array.No need to do type checking since it is handled by 'ExtractValue'.
- class GpuGetArrayStructFieldsMeta extends UnaryExprMeta[GetArrayStructFields]
- case class GpuGetMapValue(child: Expression, key: Expression, failOnError: Boolean) extends BinaryExpression with GpuBinaryExpression with ImplicitCastInputTypes with NullIntolerant with Product with Serializable
- case class GpuGetStructField(child: Expression, ordinal: Int, name: Option[String] = None) extends UnaryExpression with ShimUnaryExpression with GpuExpression with ExtractValue with NullIntolerant with Product with Serializable
- case class GpuGetTimestamp(strTs: Expression, format: Expression, sparkFormat: String, strf: String, timeZoneId: Option[String] = None) extends GpuToTimestamp with Product with Serializable
-
case class
GpuGreaterThan(left: Expression, right: Expression) extends CudfBinaryComparison with NullIntolerant with Product with Serializable
The table below shows how the result is calculated for greater-than.
The table below shows how the result is calculated for greater-than. To make calculation easier we are leveraging the fact that the cudf-result(r) always returns false. So that result is used in place of false when needed.
In this case return (lhs.nan && !lhs.nan) || result[i]
+-------------+------------+-----------------+---------------+----+ | lhs.isNan()| rhs.isNan | cudf-result(r) | final-result | gt | +-------------+------------+-----------------+---------------+----+ | t | f | f | t | t | | f | t | f | r | f | | t | t | f | r | f | | f | f | r | r | na | +-------------+------------+-----------------+---------------+----+
-
case class
GpuGreaterThanOrEqual(left: Expression, right: Expression) extends CudfBinaryComparison with NullIntolerant with Product with Serializable
The table below shows how the result is calculated for Greater-than-Eq.
The table below shows how the result is calculated for Greater-than-Eq. To make calculation easier we are leveraging the fact that the cudf-result(r) always returns false. So that result is used in place of false when needed.
In this case return lhs.isNan || result[i]
+-------------+------------+-----------------+---------------+-----+ | lhs.isNan()| rhs.isNan | cudf-result(r) | final-result | gte | +-------------+------------+-----------------+---------------+-----+ | t | f | f | t | t | | f | t | f | r | f | | t | t | f | t | t | | f | f | r | r | NA | +-------------+------------+-----------------+---------------+-----+
- case class GpuGreatest(children: Seq[Expression]) extends Expression with GpuGreatestLeastBase with Product with Serializable
- trait GpuGreatestLeastBase extends Expression with ComplexTypeMergingExpression with GpuExpression with ShimExpression
- case class GpuHour(child: Expression, timeZoneId: Option[String] = None) extends GpuUnaryExpression with GpuTimeUnaryExpression with Product with Serializable
- case class GpuHypot(left: Expression, right: Expression) extends CudfBinaryMathExpression with Product with Serializable
- case class GpuInMemoryTableScanExec(attributes: Seq[Attribute], predicates: Seq[Expression], relation: InMemoryRelation) extends SparkPlan with LeafExecNode with GpuExec with Product with Serializable
- case class GpuInitCap(child: Expression) extends GpuUnaryExpression with ImplicitCastInputTypes with Product with Serializable
-
case class
GpuInputFileBlockLength() extends GpuLeafExpression with Product with Serializable
Returns the length of the block being read, or -1 if not available.
Returns the length of the block being read, or -1 if not available. This is extra difficult because we cannot coalesce batches in between when this is used and the input file or else we could run into problems with returning the wrong thing.
-
case class
GpuInputFileBlockStart() extends GpuLeafExpression with Product with Serializable
Returns the start offset of the block being read, or -1 if not available.
Returns the start offset of the block being read, or -1 if not available. This is extra difficult because we cannot coalesce batches in between when this is used and the input file or else we could run into problems with returning the wrong thing.
-
case class
GpuInputFileName() extends GpuLeafExpression with Product with Serializable
Returns the name of the file being read, or empty string if not available.
Returns the name of the file being read, or empty string if not available. This is extra difficult because we cannot coalesce batches in between when this is used and the input file or else we could run into problems with returning the wrong thing.
- case class GpuInsertIntoHadoopFsRelationCommand(outputPath: Path, staticPartitions: TablePartitionSpec, ifPartitionNotExists: Boolean, partitionColumns: Seq[Attribute], bucketSpec: Option[BucketSpec], fileFormat: ColumnarFileFormat, options: Map[String, String], query: LogicalPlan, mode: SaveMode, catalogTable: Option[CatalogTable], fileIndex: Option[FileIndex], outputColumnNames: Seq[String], useStableSort: Boolean) extends LogicalPlan with GpuDataWritingCommand with Product with Serializable
- case class GpuIntegralDivide(left: Expression, right: Expression) extends CudfBinaryArithmetic with GpuDivModLike with Product with Serializable
- case class GpuLast(child: Expression, ignoreNulls: Boolean) extends Expression with GpuAggregateFunction with GpuAggregateWindowFunction with GpuDeterministicFirstLastCollectShim with ImplicitCastInputTypes with Serializable with Product
- case class GpuLastDay(startDate: Expression) extends GpuUnaryExpression with ImplicitCastInputTypes with Product with Serializable
- case class GpuLeast(children: Seq[Expression]) extends Expression with GpuGreatestLeastBase with Product with Serializable
- case class GpuLength(child: Expression) extends GpuUnaryExpression with ExpectsInputTypes with Product with Serializable
-
case class
GpuLessThan(left: Expression, right: Expression) extends CudfBinaryComparison with NullIntolerant with Product with Serializable
The table below shows how the result is calculated for Less-than.
The table below shows how the result is calculated for Less-than. To make calculation easier we are leveraging the fact that the cudf-result(r) always returns false. So that result is used in place of false when needed.
In this case return !lhs.nan && rhs.nan || result[i]
+-------------+------------+-----------------+---------------+-----+ | lhs.isNan()| rhs.isNan | cudf-result(r) | final-result | lt | +-------------+------------+-----------------+---------------+-----+ | t | f | f | r | f | | f | t | f | t | t | | t | t | f | r | f | | f | f | r | r | NA | +-------------+------------+-----------------+---------------+-----+
-
case class
GpuLessThanOrEqual(left: Expression, right: Expression) extends CudfBinaryComparison with NullIntolerant with Product with Serializable
The table below shows how the result is calculated for Less-than-Eq.
The table below shows how the result is calculated for Less-than-Eq. To make calculation easier we are leveraging the fact that the cudf-result(r) always returns false. So that result is used in place of false when needed.
In this case, return rhs.nan || result[i]
+-------------+------------+------------------+---------------+-----+ | lhs.isNan()| rhs.isNan | cudf-result(r) | final-result | lte | +-------------+------------+------------------+---------------+-----+ | t | f | f | r | f | | f | t | f | t | t | | t | t | f | t | t | | f | f | r | r | NA | +-------------+------------+------------------+---------------+-----+
- case class GpuLike(left: Expression, right: Expression, escapeChar: Char) extends BinaryExpression with GpuBinaryExpression with ImplicitCastInputTypes with NullIntolerant with Product with Serializable
- case class GpuLog(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuLogarithm(left: Expression, right: Expression) extends CudfBinaryMathExpression with Product with Serializable
- case class GpuLower(child: Expression) extends GpuUnaryString2StringExpression with Product with Serializable
-
abstract
class
GpuM2 extends Expression with GpuAggregateFunction with ImplicitCastInputTypes with Serializable
Base class for overriding standard deviation and variance aggregations.
Base class for overriding standard deviation and variance aggregations. This is also a GPU-based implementation of 'CentralMomentAgg' aggregation class in Spark with the fixed 'momentOrder' variable set to '2'.
- case class GpuMapConcat(children: Seq[Expression]) extends Expression with GpuComplexTypeMergingExpression with Product with Serializable
- case class GpuMapEntries(child: Expression) extends GpuUnaryExpression with ExpectsInputTypes with Product with Serializable
- case class GpuMapKeys(child: Expression) extends GpuUnaryExpression with ExpectsInputTypes with Product with Serializable
- case class GpuMapValues(child: Expression) extends GpuUnaryExpression with ExpectsInputTypes with Product with Serializable
- case class GpuMax(child: Expression) extends Expression with GpuAggregateFunction with GpuBatchedRunningWindowWithFixer with GpuAggregateWindowFunction with GpuRunningWindowFunction with Product with Serializable
- case class GpuMd5(child: Expression) extends GpuUnaryExpression with ImplicitCastInputTypes with NullIntolerant with Product with Serializable
- case class GpuMin(child: Expression) extends Expression with GpuAggregateFunction with GpuBatchedRunningWindowWithFixer with GpuAggregateWindowFunction with GpuRunningWindowFunction with Product with Serializable
- case class GpuMinute(child: Expression, timeZoneId: Option[String] = None) extends GpuUnaryExpression with GpuTimeUnaryExpression with Product with Serializable
- case class GpuMonth(child: Expression) extends GpuUnaryExpression with GpuDateUnaryExpression with Product with Serializable
-
class
GpuMultiFileAvroPartitionReader extends MultiFileCoalescingPartitionReaderBase with GpuAvroReaderBase
A PartitionReader that can read multiple AVRO files up to the certain size.
A PartitionReader that can read multiple AVRO files up to the certain size. It will coalesce small files together and copy the block data in a separate thread pool to speed up processing the small files before sending down to the GPU.
-
class
GpuMultiFileCloudAvroPartitionReader extends MultiFileCloudPartitionReaderBase with MultiFileReaderFunctions with GpuAvroReaderBase
A PartitionReader that can read multiple AVRO files in parallel.
A PartitionReader that can read multiple AVRO files in parallel. This is most efficient running in a cloud environment where the I/O of reading is slow.
When reading a file, it
- seeks to the start position of the first block located in this partition.
- next, parses the meta and sync, rewrites the meta and sync, and copies the data to a batch buffer per block, until reaching the last one of the current partition.
- sends batches to GPU at last.
- case class GpuMultiply(left: Expression, right: Expression) extends CudfBinaryArithmetic with Product with Serializable
- case class GpuMurmur3Hash(children: Seq[Expression], seed: Int) extends Expression with GpuExpression with ShimExpression with Product with Serializable
- case class GpuNormalizeNaNAndZero(child: Expression) extends GpuUnaryExpression with ExpectsInputTypes with Product with Serializable
- case class GpuNot(child: Expression) extends GpuUnaryExpression with CudfUnaryExpression with Predicate with ImplicitCastInputTypes with NullIntolerant with Product with Serializable
- case class GpuNthValue(child: Expression, offset: Expression, ignoreNulls: Boolean) extends Expression with GpuAggregateWindowFunction with ImplicitCastInputTypes with Serializable with Product
- case class GpuOctetLength(child: Expression) extends GpuUnaryExpression with ExpectsInputTypes with Product with Serializable
- case class GpuOr(left: Expression, right: Expression) extends CudfBinaryPredicateWithSideEffect with Product with Serializable
- class GpuOrcFileFormat extends ColumnarFileFormat with Logging
- class GpuOrcWriter extends ColumnarOutputWriter
- class GpuPartitionwiseSampledRDD extends PartitionwiseSampledRDD[ColumnarBatch, ColumnarBatch]
- case class GpuPivotFirst(pivotColumn: Expression, valueColumn: Expression, pivotColumnValues: Seq[Any]) extends Expression with GpuAggregateFunction with Product with Serializable
- case class GpuPmod(left: Expression, right: Expression) extends CudfBinaryArithmetic with GpuDivModLike with Product with Serializable
- class GpuPoissonSampler extends PoissonSampler[ColumnarBatch] with Arm
- case class GpuPow(left: Expression, right: Expression) extends CudfBinaryMathExpression with Product with Serializable
-
case class
GpuPreciseTimestampConversion(child: Expression, fromType: DataType, toType: DataType) extends GpuUnaryExpression with ExpectsInputTypes with Product with Serializable
Expression used internally to convert the TimestampType to Long and back without losing precision, i.e.
Expression used internally to convert the TimestampType to Long and back without losing precision, i.e. in microseconds. Used in time windowing.
- trait GpuPredicateHelper extends AnyRef
- case class GpuQuarter(child: Expression) extends GpuUnaryExpression with GpuDateUnaryExpression with Product with Serializable
- case class GpuRLike(left: Expression, right: Expression, pattern: String) extends BinaryExpression with GpuBinaryExpression with ImplicitCastInputTypes with NullIntolerant with Product with Serializable
- class GpuRLikeMeta extends BinaryExprMeta[RLike]
- case class GpuRaiseError(child: Expression) extends GpuUnaryExpression with ExpectsInputTypes with Arm with Product with Serializable
-
class
GpuReadAvroFileFormat extends AvroFileFormat with GpuReadFileFormatWithMetrics
A FileFormat that allows reading Avro files with the GPU.
- case class GpuRegExpExtract(subject: Expression, regexp: Expression, idx: Expression, cudfRegexPattern: String) extends GpuRegExpTernaryBase with ImplicitCastInputTypes with NullIntolerant with Product with Serializable
- case class GpuRegExpExtractAll(str: Expression, regexp: Expression, idx: Expression, numGroups: Int, cudfRegexPattern: String) extends GpuRegExpTernaryBase with ImplicitCastInputTypes with NullIntolerant with Product with Serializable
- class GpuRegExpExtractAllMeta extends TernaryExprMeta[RegExpExtractAll]
- class GpuRegExpExtractMeta extends TernaryExprMeta[RegExpExtract]
- case class GpuRegExpReplace(srcExpr: Expression, searchExpr: Expression, replaceExpr: Expression, javaRegexpPattern: String, cudfRegexPattern: String, cudfReplacementString: String) extends GpuRegExpTernaryBase with ImplicitCastInputTypes with Product with Serializable
- case class GpuRegExpReplaceWithBackref(child: Expression, cudfRegexPattern: String, cudfReplacementString: String) extends GpuUnaryExpression with ImplicitCastInputTypes with Product with Serializable
- abstract class GpuRegExpTernaryBase extends TernaryExpression with GpuTernaryExpression
- case class GpuRemainder(left: Expression, right: Expression) extends CudfBinaryArithmetic with GpuDivModLike with Product with Serializable
- case class GpuRint(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuRound(child: Expression, scale: Expression, outputType: DataType) extends GpuRoundBase with Product with Serializable
- abstract class GpuRoundBase extends BinaryExpression with GpuBinaryExpression with Serializable with ImplicitCastInputTypes
- case class GpuRowBasedScalaUDF(sparkFunc: AnyRef, dataType: DataType, children: Seq[Expression], inputEncoders: Seq[Option[ExpressionEncoder[_]]], outputEncoder: Option[ExpressionEncoder[_]], udfName: Option[String], nullable: Boolean, udfDeterministic: Boolean) extends Expression with GpuRowBasedUserDefinedFunction with Product with Serializable
- case class GpuScalaUDF(function: RapidsUDF, dataType: DataType, children: Seq[Expression], udfName: Option[String], nullable: Boolean, udfDeterministic: Boolean) extends Expression with GpuUserDefinedFunction with Product with Serializable
-
case class
GpuScalarSubquery(plan: BaseSubqueryExec, exprId: ExprId) extends ExecSubqueryExpression with GpuExpression with ShimExpression with Product with Serializable
GPU placeholder of ScalarSubquery, which returns the scalar result with columnarEval method.
GPU placeholder of ScalarSubquery, which returns the scalar result with columnarEval method. This placeholder is to make ScalarSubquery working as a GPUExpression to cooperate other GPU overrides.
- case class GpuSecond(child: Expression, timeZoneId: Option[String] = None) extends GpuUnaryExpression with GpuTimeUnaryExpression with Product with Serializable
- case class GpuSequence(start: Expression, stop: Expression, stepOpt: Option[Expression], timeZoneId: Option[String] = None) extends Expression with TimeZoneAwareExpression with GpuExpression with ShimExpression with Product with Serializable
- class GpuSequenceMeta extends ExprMeta[Sequence]
-
class
GpuSerializableBatch extends Serializable with AutoCloseable with Arm
- Annotations
- @SerialVersionUID()
- trait GpuShiftBase extends BinaryExpression with GpuBinaryExpression with ImplicitCastInputTypes
- case class GpuShiftLeft(left: Expression, right: Expression) extends BinaryExpression with GpuShiftBase with Product with Serializable
- case class GpuShiftRight(left: Expression, right: Expression) extends BinaryExpression with GpuShiftBase with Product with Serializable
- case class GpuShiftRightUnsigned(left: Expression, right: Expression) extends BinaryExpression with GpuShiftBase with Product with Serializable
- abstract class GpuShuffleBlockResolverBase extends ShuffleBlockResolver with Logging
- class GpuShuffleDependency[K, V, C] extends ShuffleDependency[K, V, C]
- class GpuShuffleEnv extends Logging
- class GpuShuffleHandle[K, V] extends BaseShuffleHandle[K, V, V]
- case class GpuSignum(child: Expression) extends GpuUnaryMathExpression with Product with Serializable
- case class GpuSin(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
-
class
GpuSingleDirectoryDataWriter extends GpuFileFormatDataWriter
Writes data to a single directory (used for non-dynamic-partition writes).
- case class GpuSinh(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuSize(child: Expression, legacySizeOfNull: Boolean) extends GpuUnaryExpression with Product with Serializable
- case class GpuSortArray(base: Expression, ascendingOrder: Expression) extends BinaryExpression with GpuBinaryExpression with ExpectsInputTypes with Product with Serializable
- case class GpuSqrt(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuStartsWith(left: Expression, right: Expression) extends BinaryExpression with GpuBinaryExpression with Predicate with ImplicitCastInputTypes with NullIntolerant with Product with Serializable
- case class GpuStddevPop(child: Expression, nullOnDivideByZero: Boolean) extends GpuM2 with Product with Serializable
- case class GpuStddevSamp(child: Expression, nullOnDivideByZero: Boolean) extends GpuM2 with GpuReplaceWindowFunction with Product with Serializable
- case class GpuStringLPad(str: Expression, len: Expression, pad: Expression) extends TernaryExpression with BasePad with Product with Serializable
- case class GpuStringLocate(substr: Expression, col: Expression, start: Expression) extends TernaryExpression with GpuTernaryExpression with ImplicitCastInputTypes with Product with Serializable
- case class GpuStringRPad(str: Expression, len: Expression, pad: Expression) extends TernaryExpression with BasePad with Product with Serializable
- case class GpuStringRepeat(input: Expression, repeatTimes: Expression) extends BinaryExpression with GpuBinaryExpression with ImplicitCastInputTypes with NullIntolerant with Product with Serializable
- case class GpuStringReplace(srcExpr: Expression, searchExpr: Expression, replaceExpr: Expression) extends TernaryExpression with GpuTernaryExpression with ImplicitCastInputTypes with Product with Serializable
- case class GpuStringSplit(str: Expression, regex: Expression, limit: Expression, pattern: String, isRegExp: Boolean) extends TernaryExpression with GpuTernaryExpression with ImplicitCastInputTypes with Product with Serializable
- class GpuStringSplitMeta extends StringSplitRegExpMeta[StringSplit]
- case class GpuStringToMap(strExpr: Expression, pairDelimExpr: Expression, keyValueDelimExpr: Expression, pairDelim: String, isPairDelimRegExp: Boolean, keyValueDelim: String, isKeyValueDelimRegExp: Boolean) extends Expression with GpuExpression with ShimExpression with ExpectsInputTypes with Product with Serializable
- class GpuStringToMapMeta extends StringSplitRegExpMeta[StringToMap]
- case class GpuStringTrim(column: Expression, trimParameters: Option[Expression] = None) extends Expression with GpuString2TrimExpression with ImplicitCastInputTypes with Product with Serializable
- case class GpuStringTrimLeft(column: Expression, trimParameters: Option[Expression] = None) extends Expression with GpuString2TrimExpression with ImplicitCastInputTypes with Product with Serializable
- case class GpuStringTrimRight(column: Expression, trimParameters: Option[Expression] = None) extends Expression with GpuString2TrimExpression with ImplicitCastInputTypes with Product with Serializable
- case class GpuSubstring(str: Expression, pos: Expression, len: Expression) extends TernaryExpression with GpuTernaryExpression with ImplicitCastInputTypes with NullIntolerant with Product with Serializable
- case class GpuSubstringIndex(strExpr: Expression, regexp: String, ignoredDelimExpr: Expression, ignoredCountExpr: Expression) extends TernaryExpression with GpuTernaryExpression with ImplicitCastInputTypes with Product with Serializable
- case class GpuSubtract(left: Expression, right: Expression, failOnError: Boolean) extends CudfBinaryArithmetic with Product with Serializable
- abstract class GpuSum extends Expression with GpuAggregateFunction with ImplicitCastInputTypes with GpuBatchedRunningWindowWithFixer with GpuAggregateWindowFunction with GpuRunningWindowFunction with Serializable
- case class GpuTan(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- case class GpuTanh(child: Expression) extends CudfUnaryMathExpression with Product with Serializable
- abstract class GpuTimeMath extends BinaryExpression with ShimBinaryExpression with GpuExpression with TimeZoneAwareExpression with ExpectsInputTypes with Serializable
- trait GpuTimeUnaryExpression extends GpuUnaryExpression with TimeZoneAwareExpression with ImplicitCastInputTypes with NullIntolerant
- trait GpuToCpuAggregateBufferConverter extends AnyRef
- trait GpuToCpuBufferTransition extends UnaryExpression with ShimUnaryExpression with CodegenFallback
- class GpuToCpuCollectBufferConverter extends GpuToCpuAggregateBufferConverter
- case class GpuToCpuCollectBufferTransition(child: Expression) extends UnaryExpression with GpuToCpuBufferTransition with Product with Serializable
- case class GpuToDegrees(child: Expression) extends GpuUnaryMathExpression with Product with Serializable
- case class GpuToRadians(child: Expression) extends GpuUnaryMathExpression with Product with Serializable
-
abstract
class
GpuToTimestamp extends BinaryExpression with GpuBinaryExpression with TimeZoneAwareExpression with ExpectsInputTypes
A direct conversion of Spark's ToTimestamp class which converts time to UNIX timestamp by first converting to microseconds and then dividing by the downScaleFactor
-
abstract
class
GpuToTimestampImproved extends GpuToTimestamp
An improved version of GpuToTimestamp conversion which converts time to UNIX timestamp without first converting to microseconds
- case class GpuToUnixTimestamp(strTs: Expression, format: Expression, sparkFormat: String, strf: String, timeZoneId: Option[String] = None) extends GpuToTimestamp with Product with Serializable
- case class GpuToUnixTimestampImproved(strTs: Expression, format: Expression, sparkFormat: String, strf: String, timeZoneId: Option[String] = None) extends GpuToTimestampImproved with Product with Serializable
- abstract class GpuUnaryMathExpression extends GpuUnaryExpression with Serializable with ImplicitCastInputTypes
- case class GpuUnaryMinus(child: Expression, failOnError: Boolean) extends GpuUnaryExpression with ExpectsInputTypes with NullIntolerant with Product with Serializable
- case class GpuUnaryPositive(child: Expression) extends GpuUnaryExpression with ExpectsInputTypes with NullIntolerant with Product with Serializable
- abstract class GpuUnaryString2StringExpression extends GpuUnaryExpression with ExpectsInputTypes
- case class GpuUnixTimestamp(strTs: Expression, format: Expression, sparkFormat: String, strf: String, timeZoneId: Option[String] = None) extends GpuToTimestamp with Product with Serializable
- case class GpuUnixTimestampImproved(strTs: Expression, format: Expression, sparkFormat: String, strf: String, timeZoneId: Option[String] = None) extends GpuToTimestampImproved with Product with Serializable
- case class GpuUpper(child: Expression) extends GpuUnaryString2StringExpression with Product with Serializable
- case class GpuVariancePop(child: Expression, nullOnDivideByZero: Boolean) extends GpuM2 with Product with Serializable
- case class GpuVarianceSamp(child: Expression, nullOnDivideByZero: Boolean) extends GpuM2 with Product with Serializable
- case class GpuWeekDay(child: Expression) extends GpuUnaryExpression with GpuDateUnaryExpression with Product with Serializable
-
class
GpuWriteJobDescription extends Serializable
A shared job description for all the GPU write tasks.
A shared job description for all the GPU write tasks. This is the GPU version of
org.apache.spark.sql.execution.datasources.WriteJobDescription. -
class
GpuWriteJobStatsTracker extends BasicColumnarWriteJobStatsTracker
Simple ColumnarWriteJobStatsTracker implementation that's serializable, capable of instantiating GpuWriteTaskStatsTracker on executors and processing the
WriteTaskStatsthey produce by aggregating the metrics and posting them as DriverMetricUpdates. -
class
GpuWriteTaskStatsTracker extends BasicColumnarWriteTaskStatsTracker
ColumnarWriteTaskStatsTracker implementation that produces
WriteTaskStatsand tracks writing times per task. -
case class
GpuWriterBucketSpec(bucketIdExpression: Expression, bucketFileNamePrefix: (Int) ⇒ String) extends Product with Serializable
Bucketing specification for all the write tasks.
Bucketing specification for all the write tasks. This is the GPU version of
org.apache.spark.sql.execution.datasources.WriterBucketSpec- bucketIdExpression
Expression to calculate bucket id based on bucket column(s).
- bucketFileNamePrefix
Prefix of output file name based on bucket id.
- case class GpuYear(child: Expression) extends GpuUnaryExpression with GpuDateUnaryExpression with Product with Serializable
- class InMemoryTableScanMeta extends SparkPlanMeta[InMemoryTableScanExec]
- case class ParseFormatMeta(separator: Option[Char], isTimestamp: Boolean, validRegex: String) extends Product with Serializable
-
class
ProxyRapidsShuffleInternalManagerBase extends RapidsShuffleManagerLike with Proxy
A simple proxy wrapper allowing to delay loading of the real implementation to a later point when ShimLoader has already updated Spark classloaders.
- class RapidsCachingReader[K, C] extends ShuffleReader[K, C] with Arm with Logging
- class RapidsCachingWriter[K, V] extends ShuffleWriter[K, V] with Logging
-
class
RapidsDiskBlockManager extends AnyRef
Maps logical blocks to local disk locations.
-
abstract
class
RapidsShuffleInternalManagerBase extends ShuffleManager with RapidsShuffleHeartbeatHandler with Logging
A shuffle manager optimized for the RAPIDS Plugin For Apache Spark.
A shuffle manager optimized for the RAPIDS Plugin For Apache Spark.
- Note
This is an internal class to obtain access to the private
ShuffleManagerandSortShuffleManagerclasses. When configuring Apache Spark to use the RAPIDS shuffle manager,
-
trait
RapidsShuffleManagerLike extends AnyRef
Trait that makes it easy to check whether we are dealing with the a RAPIDS Shuffle Manager
- abstract class RapidsShuffleThreadedWriterBase[K, V] extends ShuffleWriter[K, V] with RapidsShuffleWriterShimHelper with Arm with Logging
- trait RapidsShuffleWriterShimHelper extends AnyRef
- case class RegexReplace(search: String, replace: String) extends Product with Serializable
- trait ShuffleMetricsUpdater extends AnyRef
- abstract class StringSplitRegExpMeta[INPUT <: TernaryExpression] extends TernaryExprMeta[INPUT]
- class SubstringIndexMeta extends TernaryExprMeta[SubstringIndex]
- case class TempSpillBufferId extends RapidsBufferId with Product with Serializable
-
class
ThreadSafeShuffleWriteMetricsReporter extends ShuffleWriteMetrics
The
ShuffleWriteMetricsReporteris based on accumulators, which are not thread safe.The
ShuffleWriteMetricsReporteris based on accumulators, which are not thread safe. This class is a thin wrapper that adds synchronization, since these metrics will be written by multiple threads. - sealed trait TimeParserPolicy extends Serializable
- abstract class UnixTimeExprMeta[A <: BinaryExpression with TimeZoneAwareExpression] extends BinaryExprMeta[A]
- case class WindowStddevSamp(child: Expression, nullOnDivideByZero: Boolean) extends Expression with GpuAggregateWindowFunction with Product with Serializable
- case class WrappedAggFunction(aggregateFunction: GpuAggregateFunction, filter: Expression) extends Expression with GpuAggregateFunction with Product with Serializable
Value Members
- object BasicColumnarWriteJobStatsTracker extends Serializable
- object CorrectedTimeParserPolicy extends TimeParserPolicy
- object CudfNthLikeAggregate
- object CudfRegexp
- object ExceptionTimeParserPolicy extends TimeParserPolicy
-
object
ExternalSource extends Logging
The subclass of AvroProvider imports spark-avro classes.
The subclass of AvroProvider imports spark-avro classes. This file should not imports spark-avro classes because
class not foundexception may throw if spark-avro does not exist at runtime. Details see: https://github.com/NVIDIA/spark-rapids/issues/5648 - object GpuAdd extends Arm with Serializable
- object GpuAnsi extends Arm
- object GpuAverage extends Serializable
- object GpuAvroScan extends Serializable
- object GpuCreateMap extends Arm with Serializable
- object GpuDataSource extends Logging with Serializable
- object GpuDataSourceScanExec extends Serializable
- object GpuDecimalDivide extends Serializable
- object GpuDecimalMultiply extends Arm with Serializable
-
object
GpuDecimalSumOverflow
All decimal processing in Spark has overflow detection as a part of it.
All decimal processing in Spark has overflow detection as a part of it. Either it replaces the value with a null in non-ANSI mode, or it throws an exception in ANSI mode. Spark will also do the processing for larger values as
Decimalvalues which are based onBigDecimaland have unbounded precision. So in most cases it is impossible to overflow/underflow so much that an incorrect value is returned. Spark will just use more and more memory to hold the value and then check for overflow at some point when the result needs to be turned back into a 128-bit value.We cannot do the same thing. Instead we take three strategies to detect overflow.
1. For decimal values with a precision of 8 or under we follow Spark and do the SUM on the unscaled value as a long, and then bit-cast the result back to a Decimal value. this means that we can SUM
174,467,442,481maximum or minimum decimal values with a precision of 8 before overflow can no longer be detected. It is much higher for decimal values with a smaller precision. 2. For decimal values with a precision from 9 to 20 inclusive we sum them as 128-bit values. this is very similar to what we do in the first strategy. The main differences are that we use a 128-bit value when doing the sum, and we check for overflow after processing each batch. In the case of group-by and reduction that happens after the update stage and also after each merge stage. This gives us enough room that we can always detect overflow when summing a single batch. Even on a merge where we could be doing the aggregation on a batch that has all max output values in it. 3. For values from 21 to 28 inclusive we have enough room to not check for overflow on teh update aggregation, but for the merge aggregation we need to do some extra checks. This is done by taking the digits above 28 and sum them separately. We then check to see if they would have overflowed the original limits. This lets us detect overflow in cases where the original value would have wrapped around. The reason this works is because we have a hard limit on the maximum number of values in a single batch being processed.Int.MaxValue, or about 2.2 billion values. So we use a precision on the higher values that is large enough to handle 2.2 billion values and still detect overflow. This equates to a precision of about 10 more than is needed to hold the higher digits. This effectively gives us unlimited overflow detection. 4. For anything larger than precision 28 we do the same overflow detection for strategy 3, but also do it on the update aggregation. This lets us fully detect overflows in any stage of an aggregation.Note that for Window operations either there is no merge stage or it only has a single value being merged into a batch instead of an entire batch being merged together. This lets us handle the overflow detection with what is built into GpuAdd.
- object GpuDivModLike extends Arm
-
object
GpuFileFormatWriter extends Logging
A helper object for writing columnar data out to a location.
- object GpuFileSourceScanExec extends Serializable
- object GpuFloorCeil
- object GpuHypot extends Arm with Serializable
- object GpuLogarithm extends Arm with Serializable
- object GpuMurmur3Hash extends Arm with Serializable
- object GpuOrcFileFormat extends Logging
- object GpuReadAvroFileFormat extends Serializable
- object GpuRegExpUtils
- object GpuScalaUDF extends Serializable
- object GpuScalaUDFMeta
- object GpuSequenceUtil extends Arm
- object GpuShuffleEnv extends Logging
- object GpuSubstringIndex extends Serializable
- object GpuSum extends Serializable
- object GpuToTimestamp extends Arm
- object GpuWriteJobStatsTracker extends Serializable
- object InputFileUtils
- object LegacyTimeParserPolicy extends TimeParserPolicy
- object PCBSSchemaHelper
- object RapidsShuffleInternalManagerBase extends Logging
- object ShiftHelper extends Arm
- object TempSpillBufferId extends Serializable