Class ContinuousFileReaderOperator<OUT,T extends TimestampedInputSplit>

java.lang.Object
org.apache.flink.streaming.api.operators.AbstractStreamOperator<OUT>
org.apache.flink.streaming.api.functions.source.ContinuousFileReaderOperator<OUT,T>
All Implemented Interfaces:
Serializable, org.apache.flink.api.common.state.CheckpointListener, Input<T>, KeyContext, KeyContextHandler, OneInputStreamOperator<T,OUT>, org.apache.flink.streaming.api.operators.OutputTypeConfigurable<OUT>, StreamOperator<OUT>, StreamOperatorStateHandler.CheckpointedStreamOperator, YieldingOperator<OUT>

@Internal public class ContinuousFileReaderOperator<OUT,T extends TimestampedInputSplit> extends AbstractStreamOperator<OUT> implements OneInputStreamOperator<T,OUT>, org.apache.flink.streaming.api.operators.OutputTypeConfigurable<OUT>
The operator that reads the splits received from the preceding ContinuousFileMonitoringFunction. Contrary to the ContinuousFileMonitoringFunction which has a parallelism of 1, this operator can have DOP > 1.

This implementation uses MailboxExecutor to execute each action and state machine approach. The workflow is the following:

  1. start in IDLE
  2. upon receiving a split add it to the queue, switch to OPENING and enqueue a mail to process it
  3. open file, switch to READING, read one record, re-enqueue self
  4. if no more records or splits available, switch back to IDLE

On close:

  1. if IDLE then close immediately
  2. otherwise switch to CLOSING, call yield in a loop until state is CLOSED
  3. yield() causes remaining records (and splits) to be processed in the same way as above

Using MailboxExecutor allows to avoid explicit synchronization. At most one mail should be enqueued at any given time.

Using FSM approach allows to explicitly define states and enforce transitions between them.

See Also:
  • Method Details

    • initializeState

      public void initializeState(StateInitializationContext context) throws Exception
      Description copied from class: AbstractStreamOperator
      Stream operators with state which can be restored need to override this hook method.
      Specified by:
      initializeState in interface StreamOperatorStateHandler.CheckpointedStreamOperator
      Overrides:
      initializeState in class AbstractStreamOperator<OUT>
      Parameters:
      context - context that allows to register different states.
      Throws:
      Exception
    • open

      public void open() throws Exception
      Description copied from class: AbstractStreamOperator
      This method is called immediately before any elements are processed, it should contain the operator's initialization logic, e.g. state initialization.

      The default implementation does nothing.

      Specified by:
      open in interface StreamOperator<OUT>
      Overrides:
      open in class AbstractStreamOperator<OUT>
      Throws:
      Exception - An exception in this method causes the operator to fail.
    • processElement

      public void processElement(StreamRecord<T> element) throws Exception
      Description copied from interface: Input
      Processes one element that arrived on this input of the MultipleInputStreamOperator. This method is guaranteed to not be called concurrently with other methods of the operator.
      Specified by:
      processElement in interface Input<OUT>
      Throws:
      Exception
    • processWatermark

      public void processWatermark(Watermark mark) throws Exception
      Description copied from interface: Input
      Processes a Watermark that arrived on the first input of this two-input operator. This method is guaranteed to not be called concurrently with other methods of the operator.
      Specified by:
      processWatermark in interface Input<OUT>
      Overrides:
      processWatermark in class AbstractStreamOperator<OUT>
      Throws:
      Exception
      See Also:
    • finish

      public void finish() throws Exception
      Description copied from interface: StreamOperator
      This method is called at the end of data processing.

      The method is expected to flush all remaining buffered data. Exceptions during this flushing of buffered data should be propagated, in order to cause the operation to be recognized as failed, because the last data items are not processed properly.

      After this method is called, no more records can be produced for the downstream operators.

      WARNING: It is not safe to use this method to commit any transactions or other side effects! You can use this method to flush any buffered data that can later on be committed e.g. in a CheckpointListener.notifyCheckpointComplete(long).

      NOTE:This method does not need to close any resources. You should release external resources in the StreamOperator.close() method.

      Specified by:
      finish in interface StreamOperator<OUT>
      Overrides:
      finish in class AbstractStreamOperator<OUT>
      Throws:
      Exception - An exception in this method causes the operator to fail.
    • close

      public void close() throws Exception
      Description copied from interface: StreamOperator
      This method is called at the very end of the operator's life, both in the case of a successful completion of the operation, and in the case of a failure and canceling.

      This method is expected to make a thorough effort to release all resources that the operator has acquired.

      NOTE:It can not emit any records! If you need to emit records at the end of processing, do so in the StreamOperator.finish() method.

      Specified by:
      close in interface StreamOperator<OUT>
      Overrides:
      close in class AbstractStreamOperator<OUT>
      Throws:
      Exception
    • snapshotState

      public void snapshotState(StateSnapshotContext context) throws Exception
      Description copied from class: AbstractStreamOperator
      Stream operators with state, which want to participate in a snapshot need to override this hook method.
      Specified by:
      snapshotState in interface StreamOperatorStateHandler.CheckpointedStreamOperator
      Overrides:
      snapshotState in class AbstractStreamOperator<OUT>
      Parameters:
      context - context that provides information and means required for taking a snapshot
      Throws:
      Exception
    • setOutputType

      public void setOutputType(org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outTypeInfo, org.apache.flink.api.common.ExecutionConfig executionConfig)
      Specified by:
      setOutputType in interface org.apache.flink.streaming.api.operators.OutputTypeConfigurable<OUT>