Class TemporalRowTimeJoinOperator
- All Implemented Interfaces:
Serializable,org.apache.flink.api.common.state.CheckpointListener,org.apache.flink.streaming.api.operators.KeyContext,org.apache.flink.streaming.api.operators.KeyContextHandler,org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>,org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.CheckpointedStreamOperator,org.apache.flink.streaming.api.operators.Triggerable<Object,,org.apache.flink.runtime.state.VoidNamespace> org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,,org.apache.flink.table.data.RowData, org.apache.flink.table.data.RowData> org.apache.flink.streaming.api.operators.YieldingOperator<org.apache.flink.table.data.RowData>
For Event-time temporal join, its probe side is a regular table, its build side is a versioned table, the version of versioned table can extract from the build side state. This operator works by keeping on the state collection of probe and build records to process on next watermark. The idea is that between watermarks we are collecting those elements and once we are sure that there will be no updates we emit the correct result and clean up the expired data in state.
Cleaning up the state drops all of the "old" values from the probe side, where "old" is defined as older then the current watermark. Build side is also cleaned up in the similar fashion, however we always keep at least one record - the latest one - even if it's past the last watermark.
One more trick is how the emitting results and cleaning up is triggered. It is achieved by registering timers for the keys. We could register a timer for every probe and build side element's event time (when watermark exceeds this timer, that's when we are emitting and/or cleaning up the state). However this would cause huge number of registered timers. For example with following evenTimes of probe records accumulated: {1, 2, 5, 8, 9}, if we had received Watermark(10), it would trigger 5 separate timers for the same key. To avoid that we always keep only one single registered timer for any given key, registered for the minimal value. Upon triggering it, we process all records with event times older then or equal to currentWatermark.
- See Also:
-
Field Summary
Fields inherited from class org.apache.flink.table.runtime.operators.join.temporal.BaseTwoInputStreamOperatorWithStateRetention
stateCleaningEnabledFields inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator
combinedWatermark, config, lastRecordAttributes1, lastRecordAttributes2, latencyStats, LOG, metrics, output, processingTimeService, stateHandler, stateKeySelector1, stateKeySelector2, timeServiceManager -
Constructor Summary
ConstructorsConstructorDescriptionTemporalRowTimeJoinOperator(InternalTypeInfo<org.apache.flink.table.data.RowData> leftType, InternalTypeInfo<org.apache.flink.table.data.RowData> rightType, GeneratedJoinCondition generatedJoinCondition, int leftTimeAttribute, int rightTimeAttribute, long minRetentionTime, long maxRetentionTime, boolean isLeftOuterJoin) -
Method Summary
Modifier and TypeMethodDescriptionvoidcleanupState(long time) The method to be called when a cleanup timer fires.voidclose()voidonEventTime(org.apache.flink.streaming.api.operators.InternalTimer<Object, org.apache.flink.runtime.state.VoidNamespace> timer) voidopen()voidprocessElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) voidprocessElement2(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) Methods inherited from class org.apache.flink.table.runtime.operators.join.temporal.BaseTwoInputStreamOperatorWithStateRetention
cleanupLastTimer, onProcessingTime, registerProcessingCleanupTimer, useSplittableTimersMethods inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator
beforeInitializeStateHandler, finish, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getStateKeySelector1, getStateKeySelector2, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, initializeState, isAsyncStateProcessingEnabled, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processRecordAttributes, processRecordAttributes1, processRecordAttributes2, processWatermark, processWatermark, processWatermark1, processWatermark1, processWatermark2, processWatermark2, processWatermarkStatus, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setMailboxExecutor, setProcessingTimeService, setup, snapshotState, snapshotStateMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.flink.api.common.state.CheckpointListener
notifyCheckpointAborted, notifyCheckpointCompleteMethods inherited from interface org.apache.flink.streaming.api.operators.KeyContext
getCurrentKey, setCurrentKeyMethods inherited from interface org.apache.flink.streaming.api.operators.KeyContextHandler
hasKeyContextMethods inherited from interface org.apache.flink.streaming.api.operators.StreamOperator
finish, getMetricGroup, getOperatorAttributes, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotStateMethods inherited from interface org.apache.flink.streaming.api.operators.TwoInputStreamOperator
processLatencyMarker1, processLatencyMarker2, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark1, processWatermark2, processWatermark2, processWatermarkStatus1, processWatermarkStatus2
-
Constructor Details
-
TemporalRowTimeJoinOperator
public TemporalRowTimeJoinOperator(InternalTypeInfo<org.apache.flink.table.data.RowData> leftType, InternalTypeInfo<org.apache.flink.table.data.RowData> rightType, GeneratedJoinCondition generatedJoinCondition, int leftTimeAttribute, int rightTimeAttribute, long minRetentionTime, long maxRetentionTime, boolean isLeftOuterJoin)
-
-
Method Details
-
open
- Specified by:
openin interfaceorg.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>- Overrides:
openin classBaseTwoInputStreamOperatorWithStateRetention- Throws:
Exception
-
processElement1
public void processElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) throws Exception - Throws:
Exception
-
processElement2
public void processElement2(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) throws Exception - Throws:
Exception
-
onEventTime
public void onEventTime(org.apache.flink.streaming.api.operators.InternalTimer<Object, org.apache.flink.runtime.state.VoidNamespace> timer) throws Exception- Throws:
Exception
-
close
- Specified by:
closein interfaceorg.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>- Overrides:
closein classorg.apache.flink.streaming.api.operators.AbstractStreamOperator<org.apache.flink.table.data.RowData>- Throws:
Exception
-
cleanupState
public void cleanupState(long time) The method to be called when a cleanup timer fires.- Specified by:
cleanupStatein classBaseTwoInputStreamOperatorWithStateRetention- Parameters:
time- The timestamp of the fired timer.
-