Class AbstractStreamOperator<OUT>
- Type Parameters:
OUT- The output type of the operator.
- All Implemented Interfaces:
Serializable,org.apache.flink.api.common.state.CheckpointListener,KeyContext,KeyContextHandler,StreamOperator<OUT>,StreamOperatorStateHandler.CheckpointedStreamOperator,YieldingOperator<OUT>
- Direct Known Subclasses:
AbstractAsyncStateStreamOperator,AbstractUdfStreamOperator,CacheTransformationTranslator.IdentityStreamOperator,CacheTransformationTranslator.NoOpStreamOperator,ContinuousFileReaderOperator,GlobalCommitterOperator,KeyedSortPartitionOperator,QueryableAppendingStateOperator,QueryableValueStateOperator,SortPartitionOperator,SourceOperator,StreamProject,TimestampsAndWatermarksOperator
AbstractUdfStreamOperator instead (which is a specialized subclass of this class).
For concrete implementations, one of the following two interfaces must also be implemented, to
mark the operator as unary or binary: OneInputStreamOperator or TwoInputStreamOperator.
Methods of StreamOperator are guaranteed not to be called concurrently. Also, if using
the timer service, timer callbacks are also guaranteed not to be called concurrently with methods
on StreamOperator.
Note, this class is going to be removed and replaced in the future by AbstractStreamOperatorV2. However as AbstractStreamOperatorV2 is currently experimental,
AbstractStreamOperator has not been deprecated just yet.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected org.apache.flink.api.common.eventtime.IndexedCombinedWatermarkStatusprotected StreamConfigprotected RecordAttributesprotected RecordAttributesprotected LatencyStatsprotected static final org.slf4j.LoggerThe logger used by the operator class and its subclasses.protected InternalOperatorMetricGroupMetric group for the operator.protected Output<StreamRecord<OUT>>protected ProcessingTimeServiceprotected StreamOperatorStateHandlerprotected org.apache.flink.api.java.functions.KeySelector<?,?> KeySelectorfor extracting a key from an element being processed.protected org.apache.flink.api.java.functions.KeySelector<?,?> KeySelectorfor extracting a key from an element being processed.protected InternalTimeServiceManager<?> -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidInitialize necessary state components before initializing state components.voidclose()This method is called at the very end of the operator's life, both in the case of a successful completion of the operation, and in the case of a failure and canceling.voidfinish()This method is called at the end of data processing.StreamTask<?,?> org.apache.flink.api.common.ExecutionConfigGets the execution config defined on the execution environment of the job to which this operator belongs.<K,N> InternalTimerService<N> getInternalTimerService(String name, org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer, Triggerable<K, N> triggerable) Returns aInternalTimerServicethat can be used to query current processing time and event time and to set timers.<K> KeyedStateBackend<K>org.apache.flink.api.common.state.KeyedStateStoreorg.apache.flink.metrics.groups.OperatorMetricGroupprotected StringReturn the operator name.protected <N,S extends org.apache.flink.api.common.state.State, T>
SgetOrCreateKeyedState(org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer, org.apache.flink.api.common.state.StateDescriptor<S, T> stateDescriptor) <S extends org.apache.flink.api.common.state.State,N>
SgetPartitionedState(N namespace, org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer, org.apache.flink.api.common.state.StateDescriptor<S, ?> stateDescriptor) Creates a partitioned state handle, using the state backend configured for this task.protected <S extends org.apache.flink.api.common.state.State>
SgetPartitionedState(org.apache.flink.api.common.state.StateDescriptor<S, ?> stateDescriptor) Creates a partitioned state handle, using the state backend configured for this task.Returns theProcessingTimeServiceresponsible for getting the current processing time and registering timers.Returns a context that allows the operator to query information about the execution and also to interact with systems such as broadcast variables and managed state.protected org.apache.flink.api.java.functions.KeySelector<?,?> protected org.apache.flink.api.java.functions.KeySelector<?,?> protected Optional<InternalTimeServiceManager<?>>booleanWhether the first input ofStreamOperatorhas "KeyContext".booleanWhether the second input ofStreamOperatorhas "KeyContext".voidStream operators with state which can be restored need to override this hook method.final voidinitializeState(StreamTaskStateInitializer streamTaskStateManager) Provides a context to initialize all state in the operator.booleanIndicates whether this operator is enabling the async state.protected booleanIndicates whether or not implementations of this class is writing to the raw keyed state streams on snapshots, usingsnapshotState(StateSnapshotContext).voidnotifyCheckpointAborted(long checkpointId) voidnotifyCheckpointComplete(long checkpointId) voidopen()This method is called immediately before any elements are processed, it should contain the operator's initialization logic, e.g. state initialization.voidprepareSnapshotPreBarrier(long checkpointId) This method is called when the operator should do a snapshot, before it emits its own checkpoint barrier.voidprocessLatencyMarker(LatencyMarker latencyMarker) voidprocessLatencyMarker1(LatencyMarker latencyMarker) voidprocessLatencyMarker2(LatencyMarker latencyMarker) voidprocessRecordAttributes(RecordAttributes recordAttributes) voidprocessRecordAttributes1(RecordAttributes recordAttributes) voidprocessRecordAttributes2(RecordAttributes recordAttributes) voidprocessWatermark(WatermarkEvent watermark) voidprocessWatermark(Watermark mark) voidprocessWatermark1(WatermarkEvent watermark) voidprocessWatermark1(Watermark mark) voidprocessWatermark2(WatermarkEvent watermark) voidprocessWatermark2(Watermark mark) voidprocessWatermarkStatus(WatermarkStatus watermarkStatus) protected voidprocessWatermarkStatus(WatermarkStatus watermarkStatus, int index) final voidprocessWatermarkStatus1(WatermarkStatus watermarkStatus) final voidprocessWatermarkStatus2(WatermarkStatus watermarkStatus) protected voidvoidsetCurrentKey(Object key) voidsetKeyContextElement1(StreamRecord record) voidsetKeyContextElement2(StreamRecord record) voidsetMailboxExecutor(org.apache.flink.api.common.operators.MailboxExecutor mailboxExecutor) protected voidsetProcessingTimeService(ProcessingTimeService processingTimeService) protected voidsetup(StreamTask<?, ?> containingTask, StreamConfig config, Output<StreamRecord<OUT>> output) snapshotState(long checkpointId, long timestamp, CheckpointOptions checkpointOptions, CheckpointStreamFactory factory) Called to draw a state snapshot from the operator.voidsnapshotState(StateSnapshotContext context) Stream operators with state, which want to participate in a snapshot need to override this hook method.booleanCan be overridden to disable splittable timers for this particular operator even if config option is enabled.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.flink.streaming.api.operators.KeyContextHandler
hasKeyContextMethods inherited from interface org.apache.flink.streaming.api.operators.StreamOperator
getOperatorAttributes
-
Field Details
-
LOG
protected static final org.slf4j.Logger LOGThe logger used by the operator class and its subclasses. -
config
-
output
-
combinedWatermark
protected transient org.apache.flink.api.common.eventtime.IndexedCombinedWatermarkStatus combinedWatermark -
stateKeySelector1
protected transient org.apache.flink.api.java.functions.KeySelector<?,?> stateKeySelector1KeySelectorfor extracting a key from an element being processed. This is used to scope keyed state to a key. This is null if the operator is not a keyed operator.This is for elements from the first input.
-
stateKeySelector2
protected transient org.apache.flink.api.java.functions.KeySelector<?,?> stateKeySelector2KeySelectorfor extracting a key from an element being processed. This is used to scope keyed state to a key. This is null if the operator is not a keyed operator.This is for elements from the second input.
-
stateHandler
-
timeServiceManager
-
metrics
Metric group for the operator. -
latencyStats
-
processingTimeService
-
lastRecordAttributes1
-
lastRecordAttributes2
-
-
Constructor Details
-
AbstractStreamOperator
public AbstractStreamOperator() -
AbstractStreamOperator
-
-
Method Details
-
setup
protected void setup(StreamTask<?, ?> containingTask, StreamConfig config, Output<StreamRecord<OUT>> output) -
setProcessingTimeService
-
getMetricGroup
public org.apache.flink.metrics.groups.OperatorMetricGroup getMetricGroup()- Specified by:
getMetricGroupin interfaceStreamOperator<OUT>
-
beforeInitializeStateHandler
protected void beforeInitializeStateHandler()Initialize necessary state components before initializing state components. -
initializeState
public final void initializeState(StreamTaskStateInitializer streamTaskStateManager) throws Exception Description copied from interface:StreamOperatorProvides a context to initialize all state in the operator.- Specified by:
initializeStatein interfaceStreamOperator<OUT>- Throws:
Exception
-
isUsingCustomRawKeyedState
@Internal protected boolean isUsingCustomRawKeyedState()Indicates whether or not implementations of this class is writing to the raw keyed state streams on snapshots, usingsnapshotState(StateSnapshotContext). If yes, subclasses should override this method to returntrue.Subclasses need to explicitly indicate the use of raw keyed state because, internally, the
AbstractStreamOperatormay attempt to read from it as well to restore heap-based timers and ultimately fail with read errors. By setting this flag totrue, this allows theAbstractStreamOperatorto know that the data written in the raw keyed states were not written by the timer services, and skips the timer restore attempt.Please refer to FLINK-19741 for further details.
TODO: this method can be removed once all timers are moved to be managed by state backends.
- Returns:
- flag indicating whether or not this operator is writing to raw keyed state via
snapshotState(StateSnapshotContext).
-
isAsyncStateProcessingEnabled
@Internal public boolean isAsyncStateProcessingEnabled()Indicates whether this operator is enabling the async state. Can be overridden by subclasses. -
setMailboxExecutor
@Internal public void setMailboxExecutor(org.apache.flink.api.common.operators.MailboxExecutor mailboxExecutor) - Specified by:
setMailboxExecutorin interfaceYieldingOperator<OUT>
-
useSplittableTimers
@Internal public boolean useSplittableTimers()Can be overridden to disable splittable timers for this particular operator even if config option is enabled. By default, splittable timers are disabled.- Returns:
trueif splittable timers should be used (subject toStreamConfig.isUnalignedCheckpointsEnabled()andStreamConfig.isUnalignedCheckpointsSplittableTimersEnabled().falseif splittable timers should never be used.
-
open
This method is called immediately before any elements are processed, it should contain the operator's initialization logic, e.g. state initialization.The default implementation does nothing.
- Specified by:
openin interfaceStreamOperator<OUT>- Throws:
Exception- An exception in this method causes the operator to fail.
-
finish
Description copied from interface:StreamOperatorThis method is called at the end of data processing.The method is expected to flush all remaining buffered data. Exceptions during this flushing of buffered data should be propagated, in order to cause the operation to be recognized as failed, because the last data items are not processed properly.
After this method is called, no more records can be produced for the downstream operators.
WARNING: It is not safe to use this method to commit any transactions or other side effects! You can use this method to flush any buffered data that can later on be committed e.g. in a
CheckpointListener.notifyCheckpointComplete(long).NOTE:This method does not need to close any resources. You should release external resources in the
StreamOperator.close()method.- Specified by:
finishin interfaceStreamOperator<OUT>- Throws:
Exception- An exception in this method causes the operator to fail.
-
close
Description copied from interface:StreamOperatorThis method is called at the very end of the operator's life, both in the case of a successful completion of the operation, and in the case of a failure and canceling.This method is expected to make a thorough effort to release all resources that the operator has acquired.
NOTE:It can not emit any records! If you need to emit records at the end of processing, do so in the
StreamOperator.finish()method.- Specified by:
closein interfaceStreamOperator<OUT>- Throws:
Exception
-
prepareSnapshotPreBarrier
Description copied from interface:StreamOperatorThis method is called when the operator should do a snapshot, before it emits its own checkpoint barrier.This method is intended not for any actual state persistence, but only for emitting some data before emitting the checkpoint barrier. Operators that maintain some small transient state that is inefficient to checkpoint (especially when it would need to be checkpointed in a re-scalable way) but can simply be sent downstream before the checkpoint. An example are opportunistic pre-aggregation operators, which have small the pre-aggregation state that is frequently flushed downstream.
Important: This method should not be used for any actual state snapshot logic, because it will inherently be within the synchronous part of the operator's checkpoint. If heavy work is done within this method, it will affect latency and downstream checkpoint alignments.
- Specified by:
prepareSnapshotPreBarrierin interfaceStreamOperator<OUT>- Parameters:
checkpointId- The ID of the checkpoint.- Throws:
Exception- Throwing an exception here causes the operator to fail and go into recovery.
-
snapshotState
public OperatorSnapshotFutures snapshotState(long checkpointId, long timestamp, CheckpointOptions checkpointOptions, CheckpointStreamFactory factory) throws Exception Description copied from interface:StreamOperatorCalled to draw a state snapshot from the operator.- Specified by:
snapshotStatein interfaceStreamOperator<OUT>- Returns:
- a runnable future to the state handle that points to the snapshotted state. For synchronous implementations, the runnable might already be finished.
- Throws:
Exception- exception that happened during snapshotting.
-
snapshotState
Stream operators with state, which want to participate in a snapshot need to override this hook method.- Specified by:
snapshotStatein interfaceStreamOperatorStateHandler.CheckpointedStreamOperator- Parameters:
context- context that provides information and means required for taking a snapshot- Throws:
Exception
-
initializeState
Stream operators with state which can be restored need to override this hook method.- Specified by:
initializeStatein interfaceStreamOperatorStateHandler.CheckpointedStreamOperator- Parameters:
context- context that allows to register different states.- Throws:
Exception
-
notifyCheckpointComplete
- Specified by:
notifyCheckpointCompletein interfaceorg.apache.flink.api.common.state.CheckpointListener- Throws:
Exception
-
notifyCheckpointAborted
- Specified by:
notifyCheckpointAbortedin interfaceorg.apache.flink.api.common.state.CheckpointListener- Throws:
Exception
-
getExecutionConfig
public org.apache.flink.api.common.ExecutionConfig getExecutionConfig()Gets the execution config defined on the execution environment of the job to which this operator belongs.- Returns:
- The job's execution config.
-
getOperatorConfig
-
getContainingTask
-
getUserCodeClassloader
-
getOperatorName
Return the operator name. If the runtime context has been set, then the task name with subtask index is returned. Otherwise, the simple class name is returned.- Returns:
- If runtime context is set, then return task name with subtask index. Otherwise return simple class name.
-
getRuntimeContext
Returns a context that allows the operator to query information about the execution and also to interact with systems such as broadcast variables and managed state. This also allows to register timers. -
getKeyedStateBackend
-
getOperatorStateBackend
-
getProcessingTimeService
Returns theProcessingTimeServiceresponsible for getting the current processing time and registering timers. -
getPartitionedState
protected <S extends org.apache.flink.api.common.state.State> S getPartitionedState(org.apache.flink.api.common.state.StateDescriptor<S, ?> stateDescriptor) throws ExceptionCreates a partitioned state handle, using the state backend configured for this task.- Throws:
IllegalStateException- Thrown, if the key/value state was already initialized.Exception- Thrown, if the state backend cannot create the key/value state.
-
getOrCreateKeyedState
protected <N,S extends org.apache.flink.api.common.state.State, S getOrCreateKeyedStateT> (org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer, org.apache.flink.api.common.state.StateDescriptor<S, T> stateDescriptor) throws Exception- Throws:
Exception
-
getPartitionedState
public <S extends org.apache.flink.api.common.state.State,N> S getPartitionedState(N namespace, org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer, org.apache.flink.api.common.state.StateDescriptor<S, ?> stateDescriptor) throws ExceptionCreates a partitioned state handle, using the state backend configured for this task.- Throws:
IllegalStateException- Thrown, if the key/value state was already initialized.Exception- Thrown, if the state backend cannot create the key/value state.
-
setKeyContextElement1
- Specified by:
setKeyContextElement1in interfaceStreamOperator<OUT>- Throws:
Exception
-
setKeyContextElement2
- Specified by:
setKeyContextElement2in interfaceStreamOperator<OUT>- Throws:
Exception
-
hasKeyContext1
@Internal public boolean hasKeyContext1()Description copied from interface:KeyContextHandlerWhether the first input ofStreamOperatorhas "KeyContext". If false, we can omit the call ofStreamOperator.setKeyContextElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<?>)for each record arrived on the first input.- Specified by:
hasKeyContext1in interfaceKeyContextHandler- Returns:
- True if the first input has "KeyContext", false otherwise.
-
hasKeyContext2
@Internal public boolean hasKeyContext2()Description copied from interface:KeyContextHandlerWhether the second input ofStreamOperatorhas "KeyContext". If false, we can omit the call ofStreamOperator.setKeyContextElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<?>)for each record arrived on the second input.- Specified by:
hasKeyContext2in interfaceKeyContextHandler- Returns:
- True if the second input has "KeyContext", false otherwise.
-
setCurrentKey
- Specified by:
setCurrentKeyin interfaceKeyContext
-
getCurrentKey
- Specified by:
getCurrentKeyin interfaceKeyContext
-
getKeyedStateStore
public org.apache.flink.api.common.state.KeyedStateStore getKeyedStateStore() -
getStateKeySelector1
protected org.apache.flink.api.java.functions.KeySelector<?,?> getStateKeySelector1() -
getStateKeySelector2
protected org.apache.flink.api.java.functions.KeySelector<?,?> getStateKeySelector2() -
processLatencyMarker
- Throws:
Exception
-
processLatencyMarker1
- Throws:
Exception
-
processLatencyMarker2
- Throws:
Exception
-
reportOrForwardLatencyMarker
-
getInternalTimerService
public <K,N> InternalTimerService<N> getInternalTimerService(String name, org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer, Triggerable<K, N> triggerable) Returns aInternalTimerServicethat can be used to query current processing time and event time and to set timers. An operator can have several timer services, where each has its own namespace serializer. Timer services are differentiated by the string key that is given when requesting them, if you call this method with the same key multiple times you will get the same timer service instance in subsequent requests.Timers are always scoped to a key, the currently active key of a keyed stream operation. When a timer fires, this key will also be set as the currently active key.
Each timer has attached metadata, the namespace. Different timer services can have a different namespace type. If you don't need namespace differentiation you can use
VoidNamespaceSerializeras the namespace serializer.- Type Parameters:
N- The type of the timer namespace.- Parameters:
name- The name of the requested timer service. If no service exists under the given name a new one will be created and returned.namespaceSerializer-TypeSerializerfor the timer namespace.triggerable- TheTriggerablethat should be invoked when timers fire
-
processWatermark
- Throws:
Exception
-
processWatermark1
- Throws:
Exception
-
processWatermark2
- Throws:
Exception
-
processWatermarkStatus
- Throws:
Exception
-
processWatermarkStatus
- Throws:
Exception
-
processWatermarkStatus1
- Throws:
Exception
-
processWatermarkStatus2
- Throws:
Exception
-
getOperatorID
- Specified by:
getOperatorIDin interfaceStreamOperator<OUT>
-
getTimeServiceManager
-
processRecordAttributes
@Experimental public void processRecordAttributes(RecordAttributes recordAttributes) throws Exception - Throws:
Exception
-
processRecordAttributes1
-
processRecordAttributes2
-
processWatermark
- Throws:
Exception
-
processWatermark1
- Throws:
Exception
-
processWatermark2
- Throws:
Exception
-