Class IntervalJoinOperator<K,T1,T2,OUT>
- Type Parameters:
K- The type of the key based on which we join elements.T1- The type of the elements in the left stream.T2- The type of the elements in the right stream.OUT- The output type created by the user-defined function.
- All Implemented Interfaces:
Serializable,org.apache.flink.api.common.state.CheckpointListener,KeyContext,KeyContextHandler,org.apache.flink.streaming.api.operators.OutputTypeConfigurable<OUT>,StreamOperator<OUT>,StreamOperatorStateHandler.CheckpointedStreamOperator,Triggerable<K,,String> TwoInputStreamOperator<T1,,T2, OUT> UserFunctionProvider<ProcessJoinFunction<T1,,T2, OUT>> YieldingOperator<OUT>
operator to execute time-bounded stream inner joins.
By using a configurable lower and upper bound this operator will emit exactly those pairs (T1, T2) where t2.ts ∈ [T1.ts + lowerBound, T1.ts + upperBound]. Both the lower and the upper bound can be configured to be either inclusive or exclusive.
As soon as elements are joined they are passed to a user-defined ProcessJoinFunction.
The basic idea of this implementation is as follows: Whenever we receive an element at processElement1(StreamRecord) (a.k.a. the left side), we add it to the left buffer. We then
check the right buffer to see whether there are any elements that can be joined. If there are,
they are joined and passed to the aforementioned function. The same happens the other way around
when receiving an element on the right side.
Whenever a pair of elements is emitted it will be assigned the max timestamp of either of the elements.
In order to avoid the element buffers to grow indefinitely a cleanup timer is registered per element. This timer indicates when an element is not considered for joining anymore and can be removed from the state.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classA container for elements put in the left/write buffer.static classAserializerfor theIntervalJoinOperator.BufferEntry.static final classATypeSerializerSnapshotforIntervalJoinOperator.BufferEntrySerializer. -
Field Summary
Fields inherited from class org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator
userFunctionFields inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator
combinedWatermark, config, lastRecordAttributes1, lastRecordAttributes2, latencyStats, LOG, metrics, output, processingTimeService, stateHandler, stateKeySelector1, stateKeySelector2, timeServiceManager -
Constructor Summary
ConstructorsConstructorDescriptionIntervalJoinOperator(long lowerBound, long upperBound, boolean lowerBoundInclusive, boolean upperBoundInclusive, org.apache.flink.util.OutputTag<T1> leftLateDataOutputTag, org.apache.flink.util.OutputTag<T2> rightLateDataOutputTag, org.apache.flink.api.common.typeutils.TypeSerializer<T1> leftTypeSerializer, org.apache.flink.api.common.typeutils.TypeSerializer<T2> rightTypeSerializer, ProcessJoinFunction<T1, T2, OUT> udf) Creates a new IntervalJoinOperator. -
Method Summary
Modifier and TypeMethodDescriptionvoidStream operators with state which can be restored need to override this hook method.voidonEventTime(InternalTimer<K, String> timer) Invoked when an event-time timer fires.voidonProcessingTime(InternalTimer<K, String> timer) Invoked when a processing-time timer fires.voidopen()This method is called immediately before any elements are processed, it should contain the operator's initialization logic, e.g. state initialization.voidprocessElement1(StreamRecord<T1> record) Process aStreamRecordfrom the left stream.voidprocessElement2(StreamRecord<T2> record) Process aStreamRecordfrom the right stream.protected <T> voidsideOutput(T value, long timestamp, boolean isLeft) Write skipped late arriving element to SideOutput.Methods inherited from class org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator
close, finish, getUserFunction, getUserFunctionParameters, notifyCheckpointAborted, notifyCheckpointComplete, setOutputType, setup, snapshotStateMethods inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator
beforeInitializeStateHandler, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getStateKeySelector1, getStateKeySelector2, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, isAsyncStateProcessingEnabled, isUsingCustomRawKeyedState, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processRecordAttributes, processRecordAttributes1, processRecordAttributes2, processWatermark, processWatermark, processWatermark1, processWatermark1, processWatermark2, processWatermark2, processWatermarkStatus, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setMailboxExecutor, setProcessingTimeService, snapshotState, useSplittableTimersMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.flink.api.common.state.CheckpointListener
notifyCheckpointAborted, notifyCheckpointCompleteMethods inherited from interface org.apache.flink.streaming.api.operators.KeyContext
getCurrentKey, setCurrentKeyMethods inherited from interface org.apache.flink.streaming.api.operators.KeyContextHandler
hasKeyContextMethods inherited from interface org.apache.flink.streaming.api.operators.StreamOperator
close, finish, getMetricGroup, getOperatorAttributes, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotStateMethods inherited from interface org.apache.flink.streaming.api.operators.TwoInputStreamOperator
processLatencyMarker1, processLatencyMarker2, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark1, processWatermark2, processWatermark2, processWatermarkStatus1, processWatermarkStatus2
-
Constructor Details
-
IntervalJoinOperator
public IntervalJoinOperator(long lowerBound, long upperBound, boolean lowerBoundInclusive, boolean upperBoundInclusive, org.apache.flink.util.OutputTag<T1> leftLateDataOutputTag, org.apache.flink.util.OutputTag<T2> rightLateDataOutputTag, org.apache.flink.api.common.typeutils.TypeSerializer<T1> leftTypeSerializer, org.apache.flink.api.common.typeutils.TypeSerializer<T2> rightTypeSerializer, ProcessJoinFunction<T1, T2, OUT> udf) Creates a new IntervalJoinOperator.- Parameters:
lowerBound- The lower bound for evaluating if elements should be joinedupperBound- The upper bound for evaluating if elements should be joinedlowerBoundInclusive- Whether or not to include elements where the timestamp matches the lower boundupperBoundInclusive- Whether or not to include elements where the timestamp matches the upper boundudf- A user-definedProcessJoinFunctionthat gets called whenever two elements of T1 and T2 are joined
-
-
Method Details
-
open
Description copied from class:AbstractStreamOperatorThis method is called immediately before any elements are processed, it should contain the operator's initialization logic, e.g. state initialization.The default implementation does nothing.
- Specified by:
openin interfaceStreamOperator<K>- Overrides:
openin classAbstractUdfStreamOperator<OUT,ProcessJoinFunction<T1, T2, OUT>> - Throws:
Exception- An exception in this method causes the operator to fail.
-
initializeState
Description copied from class:AbstractStreamOperatorStream operators with state which can be restored need to override this hook method.- Specified by:
initializeStatein interfaceStreamOperatorStateHandler.CheckpointedStreamOperator- Overrides:
initializeStatein classAbstractUdfStreamOperator<OUT,ProcessJoinFunction<T1, T2, OUT>> - Parameters:
context- context that allows to register different states.- Throws:
Exception
-
processElement1
Process aStreamRecordfrom the left stream. Whenever anStreamRecordarrives at the left stream, it will get added to the left buffer. Possible join candidates for that element will be looked up from the right buffer and if the pair lies within the user defined boundaries, it gets passed to theProcessJoinFunction.- Specified by:
processElement1in interfaceTwoInputStreamOperator<K,T1, T2> - Parameters:
record- An incoming record to be joined- Throws:
Exception- Can throw an Exception during state access
-
processElement2
Process aStreamRecordfrom the right stream. Whenever aStreamRecordarrives at the right stream, it will get added to the right buffer. Possible join candidates for that element will be looked up from the left buffer and if the pair lies within the user defined boundaries, it gets passed to theProcessJoinFunction.- Specified by:
processElement2in interfaceTwoInputStreamOperator<K,T1, T2> - Parameters:
record- An incoming record to be joined- Throws:
Exception- Can throw an exception during state access
-
sideOutput
protected <T> void sideOutput(T value, long timestamp, boolean isLeft) Write skipped late arriving element to SideOutput. -
onEventTime
Description copied from interface:TriggerableInvoked when an event-time timer fires.- Specified by:
onEventTimein interfaceTriggerable<K,T1> - Throws:
Exception
-
onProcessingTime
Description copied from interface:TriggerableInvoked when a processing-time timer fires.- Specified by:
onProcessingTimein interfaceTriggerable<K,T1> - Throws:
Exception
-