Class ForwardForConsecutiveHashPartitioner<T>

Type Parameters:
T - Type of the elements in the Stream
All Implemented Interfaces:
Serializable, ChannelSelector<SerializationDelegate<StreamRecord<T>>>

@Internal public class ForwardForConsecutiveHashPartitioner<T> extends ForwardPartitioner<T>
If there are multiple consecutive and the same hash shuffles, SQL planner will change them except the first one to use forward partitioner, so that these operators can be chained to reduce unnecessary shuffles.

 A --[hash]--> B --[hash]--> C
            |
            V
 A --[hash]--> B --[forward]--> C

 

However, sometimes the consecutive hash operators are not chained (e.g. multiple inputs), and this kind of forward partitioners will turn into forward job edges. These forward edges still have the consecutive hash assumption, so that they cannot be changed into rescale/rebalance edges, otherwise it can lead to incorrect results. This prevents the adaptive batch scheduler from determining parallelism for other forward edge downstream job vertices(see FLINK-25046).

To solve it, we introduce the ForwardForConsecutiveHashPartitioner. When SQL planner optimizes the case of multiple consecutive and the same hash shuffles, it should use this partitioner, and then the runtime framework will change it to forward/hash after the operator chain creation.


 A --[hash]--> B --[hash]--> C
            |
            V
 A --[hash]--> B --[ForwardForConsecutiveHash]--> C

 

This partitioner will be converted to following partitioners after the operator chain creation:

1. Be converted to ForwardPartitioner if this partitioner is intra-chain.

2. Be converted to hashPartitioner if this partitioner is inter-chain.

This partitioner should only be used for SQL Batch jobs and when using AdaptiveBatchScheduler.

See Also: