Class InputPriorityGraphGenerator

java.lang.Object
org.apache.flink.table.planner.plan.nodes.exec.processor.utils.InputPriorityGraphGenerator
Direct Known Subclasses:
InputOrderCalculator, InputPriorityConflictResolver

@Internal public abstract class InputPriorityGraphGenerator extends Object
This class contains algorithm to detect and resolve input priority conflict in an ExecNode graph.

Some batch operators (for example, hash join and nested loop join) have different priorities for their inputs. When some operators are reused, a deadlock may occur due to the conflict in these priorities.

For example, consider the SQL query:

 WITH
   T1 AS (SELECT a, COUNT(*) AS cnt1 FROM x GROUP BY a),
   T2 AS (SELECT d, COUNT(*) AS cnt2 FROM y GROUP BY d)
 SELECT * FROM
   (SELECT cnt1, cnt2 FROM T1 LEFT JOIN T2 ON a = d)
   UNION ALL
   (SELECT cnt1, cnt2 FROM T2 LEFT JOIN T1 ON d = a)
 

When sub-plan reuse are enabled, we'll get the following physical plan:

 Union(all=[true], union=[cnt1, cnt2])
 :- Calc(select=[CAST(cnt1) AS cnt1, cnt2])
 :  +- HashJoin(joinType=[LeftOuterJoin], where=[=(a, d)], select=[a, cnt1, d, cnt2], build=[right])
 :     :- HashAggregate(isMerge=[true], groupBy=[a], select=[a, Final_COUNT(count1$0) AS cnt1], reuse_id=[2])
 :     :  +- Exchange(distribution=[hash[a]])
 :     :     +- LocalHashAggregate(groupBy=[a], select=[a, Partial_COUNT(*) AS count1$0])
 :     :        +- Calc(select=[a])
 :     :           +- TableSourceScan(table=[[default_catalog, default_database, x]], fields=[a, b, c])
 :     +- HashAggregate(isMerge=[true], groupBy=[d], select=[d, Final_COUNT(count1$0) AS cnt2], reuse_id=[1])
 :        +- Exchange(distribution=[hash[d]])
 :           +- LocalHashAggregate(groupBy=[d], select=[d, Partial_COUNT(*) AS count1$0])
 :              +- Calc(select=[d])
 :                 +- TableSourceScan(table=[[default_catalog, default_database, y]], fields=[d, e, f])
 +- Calc(select=[cnt1, CAST(cnt2) AS cnt2])
    +- HashJoin(joinType=[LeftOuterJoin], where=[=(d, a)], select=[d, cnt2, a, cnt1], build=[right])
       :- Reused(reference_id=[1])
       +- Reused(reference_id=[2])
 

Note that the first hash join needs to read all results from the hash aggregate whose reuse id is 1 before reading the results from the hash aggregate whose reuse id is 2, while the second hash join requires the opposite. This physical plan will thus cause a deadlock.

This class maintains a topological graph in which an edge pointing from vertex A to vertex B indicates that the results from vertex A need to be read before those from vertex B. A loop in the graph indicates a deadlock, and different subclasses of this class resolve the conflict in different ways.

For a detailed explanation of the algorithm, see appendix of the design doc.

  • Field Details

    • graph

      protected org.apache.flink.table.planner.plan.nodes.exec.processor.utils.TopologyGraph graph
  • Constructor Details

  • Method Details

    • createTopologyGraph

      protected void createTopologyGraph()
    • resolveInputPriorityConflict

      protected abstract void resolveInputPriorityConflict(ExecNode<?> node, int higherInput, int lowerInput)