Class AsyncIntervalJoinOperator<K,T1,T2,OUT>
- Type Parameters:
K- The type of the key based on which we join elements.T1- The type of the elements in the left stream.T2- The type of the elements in the right stream.OUT- The output type created by the user-defined function.
- All Implemented Interfaces:
Serializable,org.apache.flink.api.common.state.CheckpointListener,KeyContext,KeyContextHandler,org.apache.flink.streaming.api.operators.OutputTypeConfigurable<OUT>,StreamOperator<OUT>,StreamOperatorStateHandler.CheckpointedStreamOperator,Triggerable<K,,String> TwoInputStreamOperator<T1,,T2, OUT> UserFunctionProvider<ProcessJoinFunction<T1,,T2, OUT>> YieldingOperator<OUT>,AsyncStateProcessing,AsyncStateProcessingOperator
operator to execute time-bounded stream inner joins. This is
the async state access version of IntervalJoinOperator.
By using a configurable lower and upper bound this operator will emit exactly those pairs (T1, T2) where t2.ts ∈ [T1.ts + lowerBound, T1.ts + upperBound]. Both the lower and the upper bound can be configured to be either inclusive or exclusive.
As soon as elements are joined they are passed to a user-defined ProcessJoinFunction.
The basic idea of this implementation is as follows: Whenever we receive an element at processElement1(StreamRecord) (a.k.a. the left side), we add it to the left buffer. We then
check the right buffer to see whether there are any elements that can be joined. If there are,
they are joined and passed to the aforementioned function. The same happens the other way around
when receiving an element on the right side.
Whenever a pair of elements is emitted it will be assigned the max timestamp of either of the elements.
In order to avoid the element buffers to grow indefinitely a cleanup timer is registered per element. This timer indicates when an element is not considered for joining anymore and can be removed from the state.
- See Also:
-
Field Summary
Fields inherited from class org.apache.flink.runtime.asyncprocessing.operators.AbstractAsyncStateUdfStreamOperator
declarationContext, userFunctionFields inherited from class org.apache.flink.runtime.asyncprocessing.operators.AbstractAsyncStateStreamOperator
declarationManagerFields inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator
combinedWatermark, config, lastRecordAttributes1, lastRecordAttributes2, latencyStats, metrics, output, processingTimeService, stateHandler, stateKeySelector1, stateKeySelector2, timeServiceManager -
Constructor Summary
ConstructorsConstructorDescriptionAsyncIntervalJoinOperator(long lowerBound, long upperBound, boolean lowerBoundInclusive, boolean upperBoundInclusive, org.apache.flink.util.OutputTag<T1> leftLateDataOutputTag, org.apache.flink.util.OutputTag<T2> rightLateDataOutputTag, org.apache.flink.api.common.typeutils.TypeSerializer<T1> leftTypeSerializer, org.apache.flink.api.common.typeutils.TypeSerializer<T2> rightTypeSerializer, ProcessJoinFunction<T1, T2, OUT> udf) Creates a new IntervalJoinOperator. -
Method Summary
Modifier and TypeMethodDescriptionvoidonEventTime(InternalTimer<K, String> timer) Invoked when an event-time timer fires.voidonProcessingTime(InternalTimer<K, String> timer) Invoked when a processing-time timer fires.voidopen()This method is called immediately before any elements are processed, it should contain the operator's initialization logic, e.g. state initialization.voidprocessElement1(StreamRecord<T1> record) Process aStreamRecordfrom the left stream.voidprocessElement2(StreamRecord<T2> record) Process aStreamRecordfrom the right stream.protected <T> voidsideOutput(T value, long timestamp, boolean isLeft) Write skipped late arriving element to SideOutput.Methods inherited from class org.apache.flink.runtime.asyncprocessing.operators.AbstractAsyncStateUdfStreamOperator
close, finish, getUserFunction, initializeState, notifyCheckpointAborted, notifyCheckpointComplete, setOutputType, setup, snapshotStateMethods inherited from class org.apache.flink.runtime.asyncprocessing.operators.AbstractAsyncStateStreamOperator
asyncProcessWithKey, beforeInitializeStateHandler, drainStateRequests, getAsyncKeyedStateBackend, getCurrentKey, getDeclarationManager, getElementOrder, getInternalTimerService, getOrCreateKeyedState, getRecordProcessor, isAsyncStateProcessingEnabled, newKeySelected, postProcessElement, postProcessWatermark, prepareSnapshotPreBarrier, preProcessWatermark, preserveRecordOrderAndProcess, processRecordAttributes, processRecordAttributes1, processRecordAttributes2, processWatermark, processWatermark, processWatermark1, processWatermark1Internal, processWatermark2, processWatermark2Internal, processWatermarkInternal, processWatermarkStatus, processWatermarkStatus, reportOrForwardLatencyMarker, setAsyncKeyedContextElement, setKeyContextElement1, setKeyContextElement2Methods inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator
getContainingTask, getExecutionConfig, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getStateKeySelector1, getStateKeySelector2, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, isUsingCustomRawKeyedState, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processWatermark1, processWatermark2, processWatermarkStatus1, processWatermarkStatus2, setCurrentKey, setMailboxExecutor, setProcessingTimeService, snapshotState, useSplittableTimersMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.flink.api.common.state.CheckpointListener
notifyCheckpointAborted, notifyCheckpointCompleteMethods inherited from interface org.apache.flink.streaming.api.operators.KeyContext
getCurrentKey, setCurrentKeyMethods inherited from interface org.apache.flink.streaming.api.operators.KeyContextHandler
hasKeyContextMethods inherited from interface org.apache.flink.streaming.api.operators.StreamOperator
close, finish, getMetricGroup, getOperatorAttributes, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotStateMethods inherited from interface org.apache.flink.streaming.api.operators.TwoInputStreamOperator
processLatencyMarker1, processLatencyMarker2, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark1, processWatermark2, processWatermark2, processWatermarkStatus1, processWatermarkStatus2
-
Constructor Details
-
AsyncIntervalJoinOperator
public AsyncIntervalJoinOperator(long lowerBound, long upperBound, boolean lowerBoundInclusive, boolean upperBoundInclusive, org.apache.flink.util.OutputTag<T1> leftLateDataOutputTag, org.apache.flink.util.OutputTag<T2> rightLateDataOutputTag, org.apache.flink.api.common.typeutils.TypeSerializer<T1> leftTypeSerializer, org.apache.flink.api.common.typeutils.TypeSerializer<T2> rightTypeSerializer, ProcessJoinFunction<T1, T2, OUT> udf) Creates a new IntervalJoinOperator.- Parameters:
lowerBound- The lower bound for evaluating if elements should be joinedupperBound- The upper bound for evaluating if elements should be joinedlowerBoundInclusive- Whether or not to include elements where the timestamp matches the lower boundupperBoundInclusive- Whether or not to include elements where the timestamp matches the upper boundudf- A user-definedProcessJoinFunctionthat gets called whenever two elements of T1 and T2 are joined
-
-
Method Details
-
open
Description copied from class:AbstractStreamOperatorThis method is called immediately before any elements are processed, it should contain the operator's initialization logic, e.g. state initialization.The default implementation does nothing.
- Specified by:
openin interfaceStreamOperator<K>- Overrides:
openin classAbstractAsyncStateUdfStreamOperator<OUT,ProcessJoinFunction<T1, T2, OUT>> - Throws:
Exception- An exception in this method causes the operator to fail.
-
processElement1
Process aStreamRecordfrom the left stream. Whenever anStreamRecordarrives at the left stream, it will get added to the left buffer. Possible join candidates for that element will be looked up from the right buffer and if the pair lies within the user defined boundaries, it gets passed to theProcessJoinFunction.- Specified by:
processElement1in interfaceTwoInputStreamOperator<K,T1, T2> - Parameters:
record- An incoming record to be joined- Throws:
Exception- Can throw an Exception during state access
-
processElement2
Process aStreamRecordfrom the right stream. Whenever aStreamRecordarrives at the right stream, it will get added to the right buffer. Possible join candidates for that element will be looked up from the left buffer and if the pair lies within the user defined boundaries, it gets passed to theProcessJoinFunction.- Specified by:
processElement2in interfaceTwoInputStreamOperator<K,T1, T2> - Parameters:
record- An incoming record to be joined- Throws:
Exception- Can throw an exception during state access
-
sideOutput
protected <T> void sideOutput(T value, long timestamp, boolean isLeft) Write skipped late arriving element to SideOutput. -
onEventTime
Description copied from interface:TriggerableInvoked when an event-time timer fires.- Specified by:
onEventTimein interfaceTriggerable<K,T1> - Throws:
Exception
-
onProcessingTime
Description copied from interface:TriggerableInvoked when a processing-time timer fires.- Specified by:
onProcessingTimein interfaceTriggerable<K,T1> - Throws:
Exception
-