Packages

t

com.nvidia.spark.rapids

BatchedRunningWindowFixer

trait BatchedRunningWindowFixer extends AutoCloseable

Provides a way to process running window operations without needing to buffer and split the batches on partition by boundaries. When this happens part of a partition by key set may have been processed in the last batch, and the rest of it will need to be updated. For example if we are doing a running min operation. We may first get in something like PARTS: 1, 1, 2, 2 VALUES: 2, 3, 10, 9

The output of processing this would result in a new column that would look like MINS: 2, 2, 10, 9

But we don't know if the group with 2 in PARTS is done or not. So the fixer saved the last value in MINS, which is a 9. When the next batch shows up

PARTS: 2, 2, 3, 3 VALUES: 11, 5, 13, 14

We generate the window result again and get

MINS: 11, 5, 13, 13

But we cannot output this yet because there may have been overlap with the previous batch. The framework will figure that out and pass data into fixUp to do the fixing. It will pass in MINS, and also a column of boolean values true, true, false, false to indicate which rows overlapped with the previous batch. In our min example fixUp will do a min between the last value in the previous batch and the values that could overlap with it.

RESULT: 9, 5, 13, 13 which can be output.

Linear Supertypes
AutoCloseable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. BatchedRunningWindowFixer
  2. AutoCloseable
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def close(): Unit
    Definition Classes
    AutoCloseable
    Annotations
    @throws( classOf[java.lang.Exception] )
  2. abstract def fixUp(samePartitionMask: Either[ColumnVector, Boolean], sameOrderMask: Option[Either[ColumnVector, Boolean]], windowedColumnOutput: ColumnView): ColumnVector

    Fix up windowedColumnOutput with any stored state from previous batches.

    Fix up windowedColumnOutput with any stored state from previous batches. Like all window operations the input data will have been sorted by the partition by columns and the order by columns.

    samePartitionMask

    a mask that uses true to indicate the row is for the same partition by keys that was the last row in the previous batch or false to indicate it is not. If this is known to be all true or all false values a single boolean is used. If it can change for different rows than a column vector is provided. Only values that are for the same partition by keys should be modified. Because the input data is sorted by the partition by columns the boolean values will be grouped together.

    sameOrderMask

    a mask just like samePartitionMask but for ordering. This happens for some operations like rank and dense_rank that use the ordering columns in a row based query. This is not needed for all fixers and is not free to calculate, so you must set needsOrderMask to true if you are going to use it.

    windowedColumnOutput

    the output of the windowAggregation without anything fixed/modified. This should not be closed by fixUp as it will be handled by the framework.

    returns

    a fixed ColumnVector that was with outputs updated for items that were in the same group by key as the last row in the previous batch.

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  9. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  10. def incRef(col: ColumnView): ColumnVector
    Attributes
    protected
  11. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  12. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. def needsOrderMask: Boolean
  14. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  15. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  16. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  17. def toString(): String
    Definition Classes
    AnyRef → Any
  18. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  20. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated
    Deprecated

Inherited from AutoCloseable

Inherited from AnyRef

Inherited from Any

Ungrouped