Packages

  • package root
    Definition Classes
    root
  • package com
    Definition Classes
    root
  • package nvidia
    Definition Classes
    com
  • package spark
    Definition Classes
    nvidia
  • package rapids
    Definition Classes
    spark
  • class GpuHashAggregateIterator extends Iterator[ColumnarBatch] with Arm with AutoCloseable with Logging

    Iterator that takes another columnar batch iterator as input and emits new columnar batches that are aggregated based on the specified grouping and aggregation expressions.

    Iterator that takes another columnar batch iterator as input and emits new columnar batches that are aggregated based on the specified grouping and aggregation expressions. This iterator tries to perform a hash-based aggregation but is capable of falling back to a sort-based aggregation which can operate on data that is either larger than can be represented by a cudf column or larger than can fit in GPU memory.

    The iterator starts by pulling all batches from the input iterator, performing an initial projection and aggregation on each individual batch via aggregateInputBatches(). The resulting aggregated batches are cached in memory as spillable batches. Once all input batches have been aggregated, tryMergeAggregatedBatches() is called to attempt a merge of the aggregated batches into a single batch. If this is successful then the resulting batch can be returned, otherwise buildSortFallbackIterator is used to sort the aggregated batches by the grouping keys and performs a final merge aggregation pass on the sorted batches.

    Definition Classes
    rapids
  • AggHelper
  • GroupedIterator

class AggHelper extends AnyRef

Internal class used in computeAggregates for the pre, agg, and post steps

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. AggHelper
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new AggHelper(forceMerge: Boolean, isSorted: Boolean = false)

    forceMerge

    - if true, we are merging two pre-aggregated batches, so we should use the merge steps for each aggregate function

    isSorted

    - if the batch is sorted this is set to true and is passed to cuDF as an optimization hint

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  9. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  10. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  11. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  13. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  14. def performGroupByAggregation(preProcessed: ColumnarBatch): ColumnarBatch

    Used to produce a group-by aggregate

    Used to produce a group-by aggregate

    preProcessed

    the batch after the "pre" step

    returns

    a Table that has been cuDF aggregated

  15. def performReduction(preProcessed: ColumnarBatch): ColumnarBatch

    Invoke reduction functions as defined in each CudfAggreagte

    Invoke reduction functions as defined in each CudfAggreagte

    preProcessed

    - a batch after the "pre" step

  16. def postProcess(resultBatch: ColumnarBatch): ColumnarBatch

    Used to produce the outbound batch from the aggregate that could be shuffled or could be passed through the evaluateExpression if we are in the final stage.

    Used to produce the outbound batch from the aggregate that could be shuffled or could be passed through the evaluateExpression if we are in the final stage. It takes a cuDF aggregated batch and applies the "post" step: postUpdate for update, or postMerge for merge

    resultBatch

    - cuDF aggregated batch

    returns

    output batch from the aggregate

  17. def preProcess(toAggregateBatch: ColumnarBatch): ColumnarBatch

    Apply the "pre" step: preMerge for merge, or pass-through in the update case

    Apply the "pre" step: preMerge for merge, or pass-through in the update case

    toAggregateBatch

    - input (to the agg) batch from the child directly in the merge case, or from the inputProjection in the update case.

    returns

    a pre-processed batch that can be later cuDF aggregated

  18. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  19. def toString(): String
    Definition Classes
    AnyRef → Any
  20. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  22. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated
    Deprecated

Inherited from AnyRef

Inherited from Any

Ungrouped