Class ContinuousFileReaderOperator<OUT,T extends TimestampedInputSplit>
- All Implemented Interfaces:
Serializable,org.apache.flink.api.common.state.CheckpointListener,Input<T>,KeyContext,KeyContextHandler,OneInputStreamOperator<T,,OUT> org.apache.flink.streaming.api.operators.OutputTypeConfigurable<OUT>,StreamOperator<OUT>,StreamOperatorStateHandler.CheckpointedStreamOperator,YieldingOperator<OUT>
splits received from the preceding
ContinuousFileMonitoringFunction. Contrary to the ContinuousFileMonitoringFunction which has a parallelism of 1, this operator can have DOP > 1.
This implementation uses MailboxExecutor to execute each action and state machine
approach. The workflow is the following:
- start in
IDLE - upon receiving a split add it to the queue, switch to
OPENINGand enqueue amailto process it - open file, switch to
READING, read one record, re-enqueue self - if no more records or splits available, switch back to
IDLE
On close:
- if
IDLEthen close immediately - otherwise switch to
CLOSING, callyieldin a loop until state isCLOSED yield()causes remaining records (and splits) to be processed in the same way as above
Using MailboxExecutor allows to avoid explicit synchronization. At most one mail
should be enqueued at any given time.
Using FSM approach allows to explicitly define states and enforce transitions between them.
- See Also:
-
Field Summary
Fields inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator
combinedWatermark, config, lastRecordAttributes1, lastRecordAttributes2, latencyStats, metrics, output, processingTimeService, stateHandler, stateKeySelector1, stateKeySelector2, timeServiceManager -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()This method is called at the very end of the operator's life, both in the case of a successful completion of the operation, and in the case of a failure and canceling.voidfinish()This method is called at the end of data processing.voidStream operators with state which can be restored need to override this hook method.voidopen()This method is called immediately before any elements are processed, it should contain the operator's initialization logic, e.g. state initialization.voidprocessElement(StreamRecord<T> element) Processes one element that arrived on this input of theMultipleInputStreamOperator.voidprocessWatermark(Watermark mark) Processes aWatermarkthat arrived on the first input of this two-input operator.voidsetOutputType(org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outTypeInfo, org.apache.flink.api.common.ExecutionConfig executionConfig) voidsnapshotState(StateSnapshotContext context) Stream operators with state, which want to participate in a snapshot need to override this hook method.Methods inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator
beforeInitializeStateHandler, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getStateKeySelector1, getStateKeySelector2, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, isAsyncStateProcessingEnabled, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processRecordAttributes, processRecordAttributes1, processRecordAttributes2, processWatermark, processWatermark1, processWatermark1, processWatermark2, processWatermark2, processWatermarkStatus, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setMailboxExecutor, setProcessingTimeService, setup, snapshotState, useSplittableTimersMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.flink.api.common.state.CheckpointListener
notifyCheckpointAborted, notifyCheckpointCompleteMethods inherited from interface org.apache.flink.streaming.api.operators.Input
processLatencyMarker, processRecordAttributes, processWatermark, processWatermarkStatusMethods inherited from interface org.apache.flink.streaming.api.operators.KeyContext
getCurrentKey, setCurrentKeyMethods inherited from interface org.apache.flink.streaming.api.operators.KeyContextHandler
hasKeyContextMethods inherited from interface org.apache.flink.streaming.api.operators.OneInputStreamOperator
setKeyContextElementMethods inherited from interface org.apache.flink.streaming.api.operators.StreamOperator
getMetricGroup, getOperatorAttributes, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotState
-
Method Details
-
initializeState
Description copied from class:AbstractStreamOperatorStream operators with state which can be restored need to override this hook method.- Specified by:
initializeStatein interfaceStreamOperatorStateHandler.CheckpointedStreamOperator- Overrides:
initializeStatein classAbstractStreamOperator<OUT>- Parameters:
context- context that allows to register different states.- Throws:
Exception
-
open
Description copied from class:AbstractStreamOperatorThis method is called immediately before any elements are processed, it should contain the operator's initialization logic, e.g. state initialization.The default implementation does nothing.
- Specified by:
openin interfaceStreamOperator<OUT>- Overrides:
openin classAbstractStreamOperator<OUT>- Throws:
Exception- An exception in this method causes the operator to fail.
-
processElement
Description copied from interface:InputProcesses one element that arrived on this input of theMultipleInputStreamOperator. This method is guaranteed to not be called concurrently with other methods of the operator.- Specified by:
processElementin interfaceInput<OUT>- Throws:
Exception
-
processWatermark
Description copied from interface:InputProcesses aWatermarkthat arrived on the first input of this two-input operator. This method is guaranteed to not be called concurrently with other methods of the operator.- Specified by:
processWatermarkin interfaceInput<OUT>- Overrides:
processWatermarkin classAbstractStreamOperator<OUT>- Throws:
Exception- See Also:
-
finish
Description copied from interface:StreamOperatorThis method is called at the end of data processing.The method is expected to flush all remaining buffered data. Exceptions during this flushing of buffered data should be propagated, in order to cause the operation to be recognized as failed, because the last data items are not processed properly.
After this method is called, no more records can be produced for the downstream operators.
WARNING: It is not safe to use this method to commit any transactions or other side effects! You can use this method to flush any buffered data that can later on be committed e.g. in a
CheckpointListener.notifyCheckpointComplete(long).NOTE:This method does not need to close any resources. You should release external resources in the
StreamOperator.close()method.- Specified by:
finishin interfaceStreamOperator<OUT>- Overrides:
finishin classAbstractStreamOperator<OUT>- Throws:
Exception- An exception in this method causes the operator to fail.
-
close
Description copied from interface:StreamOperatorThis method is called at the very end of the operator's life, both in the case of a successful completion of the operation, and in the case of a failure and canceling.This method is expected to make a thorough effort to release all resources that the operator has acquired.
NOTE:It can not emit any records! If you need to emit records at the end of processing, do so in the
StreamOperator.finish()method.- Specified by:
closein interfaceStreamOperator<OUT>- Overrides:
closein classAbstractStreamOperator<OUT>- Throws:
Exception
-
snapshotState
Description copied from class:AbstractStreamOperatorStream operators with state, which want to participate in a snapshot need to override this hook method.- Specified by:
snapshotStatein interfaceStreamOperatorStateHandler.CheckpointedStreamOperator- Overrides:
snapshotStatein classAbstractStreamOperator<OUT>- Parameters:
context- context that provides information and means required for taking a snapshot- Throws:
Exception
-
setOutputType
public void setOutputType(org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outTypeInfo, org.apache.flink.api.common.ExecutionConfig executionConfig) - Specified by:
setOutputTypein interfaceorg.apache.flink.streaming.api.operators.OutputTypeConfigurable<OUT>
-