Class TemporalRowTimeJoinOperator

java.lang.Object
org.apache.flink.streaming.api.operators.AbstractStreamOperator<org.apache.flink.table.data.RowData>
org.apache.flink.table.runtime.operators.join.temporal.BaseTwoInputStreamOperatorWithStateRetention
org.apache.flink.table.runtime.operators.join.temporal.TemporalRowTimeJoinOperator
All Implemented Interfaces:
Serializable, org.apache.flink.api.common.state.CheckpointListener, org.apache.flink.streaming.api.operators.KeyContext, org.apache.flink.streaming.api.operators.KeyContextHandler, org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>, org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.CheckpointedStreamOperator, org.apache.flink.streaming.api.operators.Triggerable<Object,org.apache.flink.runtime.state.VoidNamespace>, org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData>, org.apache.flink.streaming.api.operators.YieldingOperator<org.apache.flink.table.data.RowData>

public class TemporalRowTimeJoinOperator extends BaseTwoInputStreamOperatorWithStateRetention
The operator for temporal join (FOR SYSTEM_TIME AS OF o.rowtime) on row time, it has no limitation about message types of the left input and right input, this means the operator deals changelog well.

For Event-time temporal join, its probe side is a regular table, its build side is a versioned table, the version of versioned table can extract from the build side state. This operator works by keeping on the state collection of probe and build records to process on next watermark. The idea is that between watermarks we are collecting those elements and once we are sure that there will be no updates we emit the correct result and clean up the expired data in state.

Cleaning up the state drops all of the "old" values from the probe side, where "old" is defined as older then the current watermark. Build side is also cleaned up in the similar fashion, however we always keep at least one record - the latest one - even if it's past the last watermark.

One more trick is how the emitting results and cleaning up is triggered. It is achieved by registering timers for the keys. We could register a timer for every probe and build side element's event time (when watermark exceeds this timer, that's when we are emitting and/or cleaning up the state). However this would cause huge number of registered timers. For example with following evenTimes of probe records accumulated: {1, 2, 5, 8, 9}, if we had received Watermark(10), it would trigger 5 separate timers for the same key. To avoid that we always keep only one single registered timer for any given key, registered for the minimal value. Upon triggering it, we process all records with event times older then or equal to currentWatermark.

See Also:
  • Field Summary

    Fields inherited from class org.apache.flink.table.runtime.operators.join.temporal.BaseTwoInputStreamOperatorWithStateRetention

    stateCleaningEnabled

    Fields inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator

    combinedWatermark, config, lastRecordAttributes1, lastRecordAttributes2, latencyStats, LOG, metrics, output, processingTimeService, stateHandler, stateKeySelector1, stateKeySelector2, timeServiceManager
  • Constructor Summary

    Constructors
    Constructor
    Description
    TemporalRowTimeJoinOperator(InternalTypeInfo<org.apache.flink.table.data.RowData> leftType, InternalTypeInfo<org.apache.flink.table.data.RowData> rightType, GeneratedJoinCondition generatedJoinCondition, int leftTimeAttribute, int rightTimeAttribute, long minRetentionTime, long maxRetentionTime, boolean isLeftOuterJoin)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    cleanupState(long time)
    The method to be called when a cleanup timer fires.
    void
     
    void
    onEventTime(org.apache.flink.streaming.api.operators.InternalTimer<Object,org.apache.flink.runtime.state.VoidNamespace> timer)
     
    void
     
    void
    processElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element)
     
    void
    processElement2(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element)
     

    Methods inherited from class org.apache.flink.table.runtime.operators.join.temporal.BaseTwoInputStreamOperatorWithStateRetention

    cleanupLastTimer, onProcessingTime, registerProcessingCleanupTimer, useSplittableTimers

    Methods inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator

    beforeInitializeStateHandler, finish, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getStateKeySelector1, getStateKeySelector2, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, initializeState, isAsyncStateProcessingEnabled, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processRecordAttributes, processRecordAttributes1, processRecordAttributes2, processWatermark, processWatermark, processWatermark1, processWatermark1, processWatermark2, processWatermark2, processWatermarkStatus, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setMailboxExecutor, setProcessingTimeService, setup, snapshotState, snapshotState

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.flink.api.common.state.CheckpointListener

    notifyCheckpointAborted, notifyCheckpointComplete

    Methods inherited from interface org.apache.flink.streaming.api.operators.KeyContext

    getCurrentKey, setCurrentKey

    Methods inherited from interface org.apache.flink.streaming.api.operators.KeyContextHandler

    hasKeyContext

    Methods inherited from interface org.apache.flink.streaming.api.operators.StreamOperator

    finish, getMetricGroup, getOperatorAttributes, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotState

    Methods inherited from interface org.apache.flink.streaming.api.operators.TwoInputStreamOperator

    processLatencyMarker1, processLatencyMarker2, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark1, processWatermark2, processWatermark2, processWatermarkStatus1, processWatermarkStatus2
  • Constructor Details

    • TemporalRowTimeJoinOperator

      public TemporalRowTimeJoinOperator(InternalTypeInfo<org.apache.flink.table.data.RowData> leftType, InternalTypeInfo<org.apache.flink.table.data.RowData> rightType, GeneratedJoinCondition generatedJoinCondition, int leftTimeAttribute, int rightTimeAttribute, long minRetentionTime, long maxRetentionTime, boolean isLeftOuterJoin)
  • Method Details

    • open

      public void open() throws Exception
      Specified by:
      open in interface org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>
      Overrides:
      open in class BaseTwoInputStreamOperatorWithStateRetention
      Throws:
      Exception
    • processElement1

      public void processElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) throws Exception
      Throws:
      Exception
    • processElement2

      public void processElement2(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) throws Exception
      Throws:
      Exception
    • onEventTime

      public void onEventTime(org.apache.flink.streaming.api.operators.InternalTimer<Object,org.apache.flink.runtime.state.VoidNamespace> timer) throws Exception
      Throws:
      Exception
    • close

      public void close() throws Exception
      Specified by:
      close in interface org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>
      Overrides:
      close in class org.apache.flink.streaming.api.operators.AbstractStreamOperator<org.apache.flink.table.data.RowData>
      Throws:
      Exception
    • cleanupState

      public void cleanupState(long time)
      The method to be called when a cleanup timer fires.
      Specified by:
      cleanupState in class BaseTwoInputStreamOperatorWithStateRetention
      Parameters:
      time - The timestamp of the fired timer.