Interface OpFusionCodegenSpec


@Internal public interface OpFusionCodegenSpec
An interface for those physical operators that support operator fusion codegen.
  • Method Summary

    Modifier and Type
    Method
    Description
    doEndInputConsume(int inputId)
    The endInput method is used to do clean work for operator corresponding input, such as the HashAgg operator needs to flush data, and the HashJoin build side need to build hash table, so each operator needs to implement the corresponding clean logic in this method.
    void
    doEndInputProduce(CodeGeneratorContext codegenCtx)
    Generate the Java source code to do operator clean work, only the leaf operator in operator DAG need to generate the code, other middle operators just call its input `endInputProduce` normally, otherwise, the operator has some specific logic.
    doProcessConsume(int inputId, List<GeneratedExpression> inputVars, GeneratedExpression row)
    The process method is responsible for the operator data processing logic, so each operator needs to implement this method to generate the code to process the row.
    void
    doProcessProduce(CodeGeneratorContext codegenCtx)
    Generate the Java source code to process rows, only the leaf operator in operator DAG need to generate the code which produce the row, other middle operators just call its input OpFusionCodegenSpecGenerator.processProduce(CodeGeneratorContext) normally, otherwise, the operator has some specific logic.
    CodeGeneratorContext
    Every operator need one CodeGeneratorContext to store the context needed during operator fusion codegen.
    ExprCodeGenerator
    Get the ExprCodeGenerator used by this operator during operator fusion codegen, .
    Class<? extends org.apache.flink.table.data.RowData>
    getInputRowDataClass(int inputId)
    Specific inputId of current operator needed RowData type, this is used to notify the upstream operator wrap the proper RowData we needed before call doProcessConsume method.
    void
    setup(OpFusionContext opFusionContext)
    Initializes the operator spec.
    usedInputColumns(int inputId)
    The subset of column index those should be evaluated before this operator.
    Prefix used in the current operator's variable names.
  • Method Details

    • setup

      void setup(OpFusionContext opFusionContext)
      Initializes the operator spec. Sets access to the context. This method must be called before doProduce and doConsume related methods.
    • variablePrefix

      String variablePrefix()
      Prefix used in the current operator's variable names.
    • usedInputColumns

      Set<Integer> usedInputColumns(int inputId)
      The subset of column index those should be evaluated before this operator.

      We will use this to insert some code to access those columns that are actually used by current operator before calling doProcessConsume().

    • getInputRowDataClass

      Class<? extends org.apache.flink.table.data.RowData> getInputRowDataClass(int inputId)
      Specific inputId of current operator needed RowData type, this is used to notify the upstream operator wrap the proper RowData we needed before call doProcessConsume method. For example, HashJoin build side need BinaryRowData.
    • getCodeGeneratorContext

      CodeGeneratorContext getCodeGeneratorContext()
      Every operator need one CodeGeneratorContext to store the context needed during operator fusion codegen.
    • getExprCodeGenerator

      ExprCodeGenerator getExprCodeGenerator()
      Get the ExprCodeGenerator used by this operator during operator fusion codegen, .
    • doProcessProduce

      void doProcessProduce(CodeGeneratorContext codegenCtx)
      Generate the Java source code to process rows, only the leaf operator in operator DAG need to generate the code which produce the row, other middle operators just call its input OpFusionCodegenSpecGenerator.processProduce(CodeGeneratorContext) normally, otherwise, the operator has some specific logic. The leaf operator produce row first, and then call OpFusionContext.processConsume(List) method to consume row.

      The code generated by leaf operator will be saved in fusionCtx, so this method doesn't has return type.

    • doProcessConsume

      String doProcessConsume(int inputId, List<GeneratedExpression> inputVars, GeneratedExpression row)
      The process method is responsible for the operator data processing logic, so each operator needs to implement this method to generate the code to process the row. This should only be called from OpFusionCodegenSpecGenerator.processConsume(List, String).

      Note: A operator can either consume the rows as RowData (row), or a list of variables (inputVars).

      Parameters:
      inputId - This is numbered starting from 1, and `1` indicates the first input.
      inputVars - field variables of current input.
      row - row variable of current input.
    • doEndInputProduce

      void doEndInputProduce(CodeGeneratorContext codegenCtx)
      Generate the Java source code to do operator clean work, only the leaf operator in operator DAG need to generate the code, other middle operators just call its input `endInputProduce` normally, otherwise, the operator has some specific logic.

      The code generated by leaf operator will be saved in fusionCtx, so this method doesn't has return type.

    • doEndInputConsume

      String doEndInputConsume(int inputId)
      The endInput method is used to do clean work for operator corresponding input, such as the HashAgg operator needs to flush data, and the HashJoin build side need to build hash table, so each operator needs to implement the corresponding clean logic in this method.

      For blocking operators such as HashAgg, the OpFusionContext.processConsume(List, String) method needs to be called first to consume the data, followed by the `endInputConsume` method to do the cleanup work of the downstream operators. For pipeline operators such as Project, you only need to call the `endInputConsume` method.

      Parameters:
      inputId - This is numbered starting from 1, and `1` indicates the first input.