Class ProcessTableFunction<T>

java.lang.Object
org.apache.flink.table.functions.UserDefinedFunction
org.apache.flink.table.functions.ProcessTableFunction<T>
Type Parameters:
T - The type of the output row. Either an explicit composite type or an atomic type that is implicitly wrapped into a row consisting of one field.
All Implemented Interfaces:
Serializable, FunctionDefinition

@PublicEvolving public abstract class ProcessTableFunction<T> extends UserDefinedFunction
Base class for a user-defined process table function. A process table function (PTF) maps zero, one, or multiple tables to zero, one, or multiple rows (or structured types). Scalar arguments are also supported. If the output record consists of only one field, the wrapper can be omitted, and a scalar value can be emitted that will be implicitly wrapped into a row by the runtime.

PTFs are the most powerful function kind for Flink SQL and Table API. They enable implementing user-defined operators that can be as feature-rich as built-in operations. PTFs have access to Flink's managed state, event-time and timer services, underlying table changelogs, and can take multiple ordered and/or partitioned tables to produce a new table.

Table Semantics and Virtual Processors

PTFs can produce a new table by consuming tables as arguments. For scalability, input tables are distributed across so-called "virtual processors". A virtual processor, as defined by the SQL standard, executes a PTF instance and has access only to a portion of the entire table. The argument declaration decides about the size of the portion and co-location of data. Conceptually, tables can be processed either "as row" (i.e. with row semantics) or "as set" (i.e. with set semantics).

Table Argument with Row Semantics

A PTF that takes a table with row semantics assumes that there is no correlation between rows and each row can be processed independently. The framework is free in how to distribute rows across virtual processors and each virtual processor has access only to the currently processed row.

Table Argument with Set Semantics

A PTF that takes a table with set semantics assumes that there is a correlation between rows. When calling the function, the PARTITION BY clause defines the columns for correlation. The framework ensures that all rows belonging to same set are co-located. A PTF instance is able to access all rows belonging to the same set. In other words: The virtual processor is scoped by a key context.

It is also possible not to provide a key (ArgumentTrait.OPTIONAL_PARTITION_BY), in which case only one virtual processor handles the entire table, thereby losing scalability benefits.

Implementation

The behavior of a ProcessTableFunction can be defined by implementing a custom evaluation method. The evaluation method must be declared publicly, not static, and named eval. Overloading is not supported.

For storing a user-defined function in a catalog, the class must have a default constructor and must be instantiable during runtime. Anonymous functions in Table API can only be persisted if the function object is not stateful (i.e. containing only transient and static fields).

Data Types

By default, input and output data types are automatically extracted using reflection. This includes the generic argument T of the class for determining an output data type. Input arguments are derived from the eval() method. If the reflective information is not sufficient, it can be supported and enriched with FunctionHint, ArgumentHint, and DataTypeHint annotations.

The following examples show how to specify data types:


 // Function that accepts two scalar INT arguments and emits them as an implicit ROW < INT >
 class AdditionFunction extends ProcessTableFunction<Integer> {
   public void eval(Integer a, Integer b) {
     collect(a + b);
   }
 }

 // Function that produces an explicit ROW < i INT, s STRING > from arguments, the function hint helps in
 // declaring the row's fields
 @FunctionHint(output = @DataTypeHint("ROW< i INT, s STRING >"))
 class DuplicatorFunction extends ProcessTableFunction<Row> {
   public void eval(Integer i, String s) {
     collect(Row.of(i, s));
     collect(Row.of(i, s));
   }
 }

 // Function that accepts DECIMAL(10, 4) and emits it as an explicit ROW < DECIMAL(10, 4) >
 @FunctionHint(output = @DataTypeHint("ROW< DECIMAL(10, 4) >"))
 class DuplicatorFunction extends TableFunction<Row> {
   public void eval(@DataTypeHint("DECIMAL(10, 4)") BigDecimal d) {
     collect(Row.of(d));
     collect(Row.of(d));
   }
 }
 

Arguments

The ArgumentHint annotation enables declaring the name, data type, and kind of each argument (i.e. ArgumentTrait.SCALAR, ArgumentTrait.TABLE_AS_SET, or ArgumentTrait.TABLE_AS_ROW). It allows specifying other traits for table arguments as well:


 // Function that has two arguments:
 // "input_table" (a table with set semantics) and "threshold" (a scalar value)
 class ThresholdFunction extends ProcessTableFunction<Integer> {
   public void eval(
       // For table arguments, a data type for Row is optional (leading to polymorphic behavior)
       @ArgumentHint(value = ArgumentTrait.TABLE_AS_SET, name = "input_table") Row t,
       // Scalar arguments require a data type either explicit or via reflection
       @ArgumentHint(value = ArgumentTrait.SCALAR, name = "threshold") Integer threshold) {
     int amount = t.getFieldAs("amount");
     if (amount >= threshold) {
       collect(amount);
     }
   }
 }
 

Table arguments can declare a concrete data type (of either row or structured type) or accept any type of row in polymorphic fashion:


 // Function with explicit table argument type of row
 class MyPTF extends ProcessTableFunction<String> {
   public void eval(Context ctx, @ArgumentHint(value = ArgumentTrait.TABLE_AS_SET, type = "ROW < s STRING >") Row t) {
     TableSemantics semantics = ctx.tableSemanticsFor("t");
     // Always returns "ROW < s STRING >"
     semantics.dataType();
     ...
   }
 }

 // Function with explicit table argument type of structured type "Customer"
 class MyPTF extends ProcessTableFunction<String> {
   public void eval(Context ctx, @ArgumentHint(value = ArgumentTrait.TABLE_AS_SET) Customer c) {
     TableSemantics semantics = ctx.tableSemanticsFor("c");
     // Always returns structured type of "Customer"
     semantics.dataType();
     ...
   }
 }

 // Function with polymorphic table argument
 class MyPTF extends ProcessTableFunction<String> {
   public void eval(Context ctx, @ArgumentHint(value = ArgumentTrait.TABLE_AS_SET) Row t) {
     TableSemantics semantics = ctx.tableSemanticsFor("t");
     // Always returns "ROW" but content depends on the table that is passed into the call
     semantics.dataType();
     ...
   }
 }
 

Context

A ProcessTableFunction.Context can be added as a first argument to the eval() method for additional information about the input tables and other services provided by the framework:


 // a function that accesses the Context for reading the PARTITION BY columns and
 // excluding them when building a result string
 class ConcatNonKeysFunction extends ProcessTableFunction<String> {
   public void eval(Context ctx, @ArgumentHint(ArgumentTrait.TABLE_AS_SET) Row inputTable) {
     TableSemantics semantics = ctx.tableSemanticsFor("inputTable");
     List<Integer> keys = Arrays.asList(semantics.partitionByColumns());
     return IntStream.range(0, inputTable.getArity())
       .filter(pos -> !keys.contains(pos))
       .mapToObj(inputTable::getField)
       .map(Object::toString)
       .collect(Collectors.joining(", "));
   }
 }
 
See Also:
  • Constructor Details

    • ProcessTableFunction

      public ProcessTableFunction()
  • Method Details

    • setCollector

      public final void setCollector(org.apache.flink.util.Collector<T> collector)
      Internal use. Sets the current collector.
    • collect

      protected final void collect(T row)
      Emits an (implicit or explicit) output row.

      If null is emitted as an explicit row, it will be skipped by the runtime. For implicit rows, the row's field will be null.

      Parameters:
      row - the output row
    • getKind

      public final FunctionKind getKind()
      Description copied from interface: FunctionDefinition
      Returns the kind of function this definition describes.
    • getTypeInference

      public TypeInference getTypeInference(DataTypeFactory typeFactory)
      Description copied from class: UserDefinedFunction
      Returns the logic for performing type inference of a call to this function definition.

      The type inference process is responsible for inferring unknown types of input arguments, validating input arguments, and producing result types. The type inference process happens independent of a function body. The output of the type inference is used to search for a corresponding runtime implementation.

      Instances of type inference can be created by using TypeInference.newBuilder().

      See BuiltInFunctionDefinitions for concrete usage examples.

      The type inference for user-defined functions is automatically extracted using reflection. It does this by analyzing implementation methods such as eval() or accumulate() and the generic parameters of a function class if present. If the reflective information is not sufficient, it can be supported and enriched with DataTypeHint and FunctionHint annotations.

      Note: Overriding this method is only recommended for advanced users. If a custom type inference is specified, it is the responsibility of the implementer to make sure that the output of the type inference process matches with the implementation method:

      The implementation method must comply with each DataType.getConversionClass() returned by the type inference. For example, if DataTypes.TIMESTAMP(3).bridgedTo(java.sql.Timestamp.class) is an expected argument type, the method must accept a call eval(java.sql.Timestamp).

      Regular Java calling semantics (including type widening and autoboxing) are applied when calling an implementation method which means that the signature can be eval(java.lang.Object).

      The runtime will take care of converting the data to the data format specified by the DataType.getConversionClass() coming from the type inference logic.

      Specified by:
      getTypeInference in interface FunctionDefinition
      Specified by:
      getTypeInference in class UserDefinedFunction