Class WatermarkStatus
java.lang.Object
org.apache.flink.streaming.runtime.streamrecord.StreamElement
org.apache.flink.streaming.runtime.watermarkstatus.WatermarkStatus
A Watermark Status element informs stream tasks whether or not they should continue to expect
watermarks from the input stream that sent them. There are 2 kinds of status, namely
IDLE and ACTIVE. Watermark Status elements are generated
at the sources, and may be propagated through the tasks of the topology. They directly infer the
current status of the emitting task; a SourceStreamTask or StreamTask emits a
IDLE if it will temporarily halt to emit any watermarks (i.e. is idle),
and emits a ACTIVE once it resumes to do so (i.e. is active). Tasks are
responsible for propagating their status further downstream once they toggle between being idle
and active. The cases that source tasks and downstream tasks are considered either idle or active
is explained below:
- Source tasks: A source task is considered to be idle if its head operator, i.e. a
StreamSource, will not emit watermarks for an indefinite amount of time. This is the case, for example, for Flink's Kafka Consumer, where sources might initially have no assigned partitions to read from, or no records can be read from the assigned partitions. Once the headStreamSourceoperator detects that it will resume emitting data, the source task is considered to be active.StreamSources are responsible for toggling the status of the containing source task and ensuring that no watermarks will be emitted while the task is idle. This guarantee should be enforced on sources throughSourceFunction.SourceContextimplementations. - Downstream tasks: a downstream task is considered to be idle if all its input streams are
idle, i.e. the last received Watermark Status element from all input streams is a
IDLE. As long as one of its input streams is active, i.e. the last received Watermark Status element from the input stream isACTIVE, the task is active.
Watermark Status elements received at downstream tasks also affect and control how their
operators process and advance their watermarks. The below describes the effects (the logic is
implemented as a StatusWatermarkValve which downstream tasks should use for such
purposes):
- Since there may be watermark generators that might produce watermarks anywhere in the middle of topologies regardless of whether there are input data at the operator, the current status of the task must be checked before forwarding watermarks emitted from an operator. If the status is actually idle, the watermark must be blocked.
- For downstream tasks with multiple input streams, the watermarks of input streams that are temporarily idle, or has resumed to be active but its watermark is behind the overall min watermark of the operator, should not be accounted for when deciding whether or not to advance the watermark and propagated through the operator chain.
Note that to notify downstream tasks that a source task is permanently closed and will no
longer send any more elements, the source should still send a Watermark.MAX_WATERMARK
instead of IDLE. Watermark Status elements only serve as markers for
temporary status.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final WatermarkStatusstatic final intstatic final WatermarkStatusstatic final intfinal int -
Constructor Summary
Constructors -
Method Summary
Methods inherited from class org.apache.flink.streaming.runtime.streamrecord.StreamElement
asLatencyMarker, asRecord, asRecordAttributes, asWatermark, asWatermarkStatus, isLatencyMarker, isRecord, isRecordAttributes, isWatermark, isWatermarkStatus
-
Field Details
-
IDLE_STATUS
public static final int IDLE_STATUS- See Also:
-
ACTIVE_STATUS
public static final int ACTIVE_STATUS- See Also:
-
IDLE
-
ACTIVE
-
status
public final int status
-
-
Constructor Details
-
WatermarkStatus
public WatermarkStatus(int status)
-
-
Method Details