trait BatchedRunningWindowFixer extends AutoCloseable
Provides a way to process running window operations without needing to buffer and split the
batches on partition by boundaries. When this happens part of a partition by key set may
have been processed in the last batch, and the rest of it will need to be updated. For example
if we are doing a running min operation. We may first get in something like
PARTS: 1, 1, 2, 2
VALUES: 2, 3, 10, 9
The output of processing this would result in a new column that would look like
MINS: 2, 2, 10, 9
But we don't know if the group with 2 in PARTS is done or not. So the fixer saved the last value in MINS, which is a 9. When the next batch shows up
PARTS: 2, 2, 3, 3
VALUES: 11, 5, 13, 14
We generate the window result again and get
MINS: 11, 5, 13, 13
But we cannot output this yet because there may have been overlap with the previous batch.
The framework will figure that out and pass data into fixUp to do the fixing. It will
pass in MINS, and also a column of boolean values true, true, false, false to indicate
which rows overlapped with the previous batch. In our min example fixUp will do a min
between the last value in the previous batch and the values that could overlap with it.
RESULT: 9, 5, 13, 13
which can be output.
- Alphabetic
- By Inheritance
- BatchedRunningWindowFixer
- AutoCloseable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Abstract Value Members
-
abstract
def
close(): Unit
- Definition Classes
- AutoCloseable
- Annotations
- @throws( classOf[java.lang.Exception] )
-
abstract
def
fixUp(samePartitionMask: Either[ColumnVector, Boolean], sameOrderMask: Option[Either[ColumnVector, Boolean]], windowedColumnOutput: ColumnView): ColumnVector
Fix up
windowedColumnOutputwith any stored state from previous batches.Fix up
windowedColumnOutputwith any stored state from previous batches. Like all window operations the input data will have been sorted by the partition by columns and the order by columns.- samePartitionMask
a mask that uses
trueto indicate the row is for the same partition by keys that was the last row in the previous batch orfalseto indicate it is not. If this is known to be all true or all false values a single boolean is used. If it can change for different rows than a column vector is provided. Only values that are for the same partition by keys should be modified. Because the input data is sorted by the partition by columns the boolean values will be grouped together.- sameOrderMask
a mask just like
samePartitionMaskbut for ordering. This happens for some operations likerankanddense_rankthat use the ordering columns in a row based query. This is not needed for all fixers and is not free to calculate, so you must setneedsOrderMaskto true if you are going to use it.- windowedColumnOutput
the output of the windowAggregation without anything fixed/modified. This should not be closed by
fixUpas it will be handled by the framework.- returns
a fixed ColumnVector that was with outputs updated for items that were in the same group by key as the last row in the previous batch.
Concrete Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
incRef(col: ColumnView): ColumnVector
- Attributes
- protected
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def needsOrderMask: Boolean
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )