Class CollectSinkFunction<IN>
- Type Parameters:
IN- type of results to be written into the sink.
- All Implemented Interfaces:
Serializable,org.apache.flink.api.common.functions.Function,org.apache.flink.api.common.functions.RichFunction,org.apache.flink.api.common.state.CheckpointListener,CheckpointedFunction,SinkFunction<IN>
This sink works by limiting the number of results buffered in it (can be configured) so that when the buffer is full, it back-pressures the job until the client consumes some results.
NOTE: When using this sink, make sure that its parallelism is 1, and make sure that it is used
in a StreamTask.
Communication Protocol Explanation
We maintain the following variables in this communication protocol
- version: This variable will be set to a random value when the sink opens. Client discovers that the sink has restarted if this variable is different.
- offset: This indicates that client has successfully received the results before this offset. Sink can safely throw these results away.
- lastCheckpointedOffset: This is the value of
offsetwhen the checkpoint happens. This value will be restored from the checkpoint and set back tooffsetwhen the sink restarts. Clients who need exactly-once semantics need to rely on this value for the position to revert when a failover happens.
Client will put version and offset into the request, indicating that
it thinks what the current version is and it has received this much results.
Sink will check the validity of the request. If version mismatches or
offset is smaller than expected, sink will send back the current version and
lastCheckpointedOffset with an empty result list.
If the request is valid, sink prepares some results starting from offset and
sends them back to the client with lastCheckpointedOffset. If there is currently no
results starting from offset, sink will not wait but will instead send back an empty
result list.
For client who wants exactly-once semantics, when receiving the response, the client will check for the following conditions:
- If the version mismatches, client knows that sink has restarted. It will throw away all
uncheckpointed results after
lastCheckpointedOffset. - If
lastCheckpointedOffsetincreases, client knows that a checkpoint happens. It can now move all results before this offset to a user-visible buffer. - If the response also contains new results, client will now move these new results into uncheckpointed buffer.
Note that
- user can only see results before a
lastCheckpointedOffset, and - client will go back to the latest
lastCheckpointedOffsetwhen sink restarts,
client will never throw away results in user-visible buffer. So this communication protocol achieves exactly-once semantics.
In order not to block job finishing/cancelling, if there are still results in sink's buffer when job terminates, these results will be sent back to client through accumulators.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.flink.streaming.api.functions.sink.legacy.SinkFunction
SinkFunction.Context -
Constructor Summary
ConstructorsConstructorDescriptionCollectSinkFunction(org.apache.flink.api.common.typeutils.TypeSerializer<IN> serializer, long maxBytesPerBatch, String accumulatorName) -
Method Summary
Modifier and TypeMethodDescriptionvoidvoidclose()static org.apache.flink.api.java.tuple.Tuple2<Long,CollectCoordinationResponse> deserializeAccumulatorResult(byte[] serializedAccResults) longvoidThis method is called when the parallel function instance is created during distributed execution.voidinvoke(IN value, SinkFunction.Context context) Writes the given value to the sink.voidnotifyCheckpointAborted(long checkpointId) voidnotifyCheckpointComplete(long checkpointId) voidopen(org.apache.flink.api.common.functions.OpenContext openContext) static byte[]serializeAccumulatorResult(long offset, String version, long lastCheckpointedOffset, List<byte[]> buffer) voidsetOperatorEventGateway(OperatorEventGateway eventGateway) voidsnapshotState(FunctionSnapshotContext context) This method is called when a snapshot for a checkpoint is requested.Methods inherited from class org.apache.flink.api.common.functions.AbstractRichFunction
getIterationRuntimeContext, getRuntimeContext, setRuntimeContextMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.flink.streaming.api.functions.sink.legacy.SinkFunction
finish, invoke, writeWatermark
-
Constructor Details
-
CollectSinkFunction
-
-
Method Details
-
getMaxBytesPerBatch
public long getMaxBytesPerBatch() -
initializeState
Description copied from interface:CheckpointedFunctionThis method is called when the parallel function instance is created during distributed execution. Functions typically set up their state storing data structures in this method.- Specified by:
initializeStatein interfaceCheckpointedFunction- Parameters:
context- the context for initializing the operator- Throws:
Exception- Thrown, if state could not be created ot restored.
-
snapshotState
Description copied from interface:CheckpointedFunctionThis method is called when a snapshot for a checkpoint is requested. This acts as a hook to the function to ensure that all state is exposed by means previously offered throughFunctionInitializationContextwhen the Function was initialized, or offered now byFunctionSnapshotContextitself.- Specified by:
snapshotStatein interfaceCheckpointedFunction- Parameters:
context- the context for drawing a snapshot of the operator- Throws:
Exception- Thrown, if state could not be created ot restored.
-
open
- Specified by:
openin interfaceorg.apache.flink.api.common.functions.RichFunction- Overrides:
openin classorg.apache.flink.api.common.functions.AbstractRichFunction- Throws:
Exception
-
invoke
Description copied from interface:SinkFunctionWrites the given value to the sink. This function is called for every record.You have to override this method when implementing a
SinkFunction, this is adefaultmethod for backward compatibility with the old-style method only.- Specified by:
invokein interfaceSinkFunction<IN>- Parameters:
value- The input record.context- Additional context about the input record.- Throws:
Exception- This method may throw exceptions. Throwing an exception will cause the operation to fail and may trigger recovery.
-
close
- Specified by:
closein interfaceorg.apache.flink.api.common.functions.RichFunction- Overrides:
closein classorg.apache.flink.api.common.functions.AbstractRichFunction- Throws:
Exception
-
accumulateFinalResults
- Throws:
Exception
-
notifyCheckpointComplete
public void notifyCheckpointComplete(long checkpointId) - Specified by:
notifyCheckpointCompletein interfaceorg.apache.flink.api.common.state.CheckpointListener
-
notifyCheckpointAborted
public void notifyCheckpointAborted(long checkpointId) - Specified by:
notifyCheckpointAbortedin interfaceorg.apache.flink.api.common.state.CheckpointListener
-
setOperatorEventGateway
-
serializeAccumulatorResult
@VisibleForTesting public static byte[] serializeAccumulatorResult(long offset, String version, long lastCheckpointedOffset, List<byte[]> buffer) throws IOException - Throws:
IOException
-
deserializeAccumulatorResult
public static org.apache.flink.api.java.tuple.Tuple2<Long,CollectCoordinationResponse> deserializeAccumulatorResult(byte[] serializedAccResults) throws IOException - Throws:
IOException
-