Class ProcessTableFunction<T>
- Type Parameters:
T- The type of the output row. Either an explicit composite type or an atomic type that is implicitly wrapped into a row consisting of one field.
- All Implemented Interfaces:
Serializable,FunctionDefinition
PTFs are the most powerful function kind for Flink SQL and Table API. They enable implementing user-defined operators that can be as feature-rich as built-in operations. PTFs have access to Flink's managed state, event-time and timer services, underlying table changelogs, and can take multiple ordered and/or partitioned tables to produce a new table.
Table Semantics and Virtual Processors
PTFs can produce a new table by consuming tables as arguments. For scalability, input tables are distributed across so-called "virtual processors". A virtual processor, as defined by the SQL standard, executes a PTF instance and has access only to a portion of the entire table. The argument declaration decides about the size of the portion and co-location of data. Conceptually, tables can be processed either "as row" (i.e. with row semantics) or "as set" (i.e. with set semantics).
Table Argument with Row Semantics
A PTF that takes a table with row semantics assumes that there is no correlation between rows and each row can be processed independently. The framework is free in how to distribute rows across virtual processors and each virtual processor has access only to the currently processed row.
Table Argument with Set Semantics
A PTF that takes a table with set semantics assumes that there is a correlation between rows. When calling the function, the PARTITION BY clause defines the columns for correlation. The framework ensures that all rows belonging to same set are co-located. A PTF instance is able to access all rows belonging to the same set. In other words: The virtual processor is scoped by a key context.
It is also possible not to provide a key (ArgumentTrait.OPTIONAL_PARTITION_BY), in
which case only one virtual processor handles the entire table, thereby losing scalability
benefits.
Implementation
The behavior of a ProcessTableFunction can be defined by implementing a custom
evaluation method. The evaluation method must be declared publicly, not static, and named
eval. Overloading is not supported.
For storing a user-defined function in a catalog, the class must have a default constructor and must be instantiable during runtime. Anonymous functions in Table API can only be persisted if the function object is not stateful (i.e. containing only transient and static fields).
Data Types
By default, input and output data types are automatically extracted using reflection. This
includes the generic argument T of the class for determining an output data type. Input
arguments are derived from the eval() method. If the reflective information is not
sufficient, it can be supported and enriched with FunctionHint, ArgumentHint, and
DataTypeHint annotations.
The following examples show how to specify data types:
// Function that accepts two scalar INT arguments and emits them as an implicit ROW < INT >
class AdditionFunction extends ProcessTableFunction<Integer> {
public void eval(Integer a, Integer b) {
collect(a + b);
}
}
// Function that produces an explicit ROW < i INT, s STRING > from arguments, the function hint helps in
// declaring the row's fields
@FunctionHint(output = @DataTypeHint("ROW< i INT, s STRING >"))
class DuplicatorFunction extends ProcessTableFunction<Row> {
public void eval(Integer i, String s) {
collect(Row.of(i, s));
collect(Row.of(i, s));
}
}
// Function that accepts DECIMAL(10, 4) and emits it as an explicit ROW < DECIMAL(10, 4) >
@FunctionHint(output = @DataTypeHint("ROW< DECIMAL(10, 4) >"))
class DuplicatorFunction extends TableFunction<Row> {
public void eval(@DataTypeHint("DECIMAL(10, 4)") BigDecimal d) {
collect(Row.of(d));
collect(Row.of(d));
}
}
Arguments
The ArgumentHint annotation enables declaring the name, data type, and kind of each
argument (i.e. ArgumentTrait.SCALAR, ArgumentTrait.TABLE_AS_SET, or ArgumentTrait.TABLE_AS_ROW).
It allows specifying other traits for table arguments as well:
// Function that has two arguments:
// "input_table" (a table with set semantics) and "threshold" (a scalar value)
class ThresholdFunction extends ProcessTableFunction<Integer> {
public void eval(
// For table arguments, a data type for Row is optional (leading to polymorphic behavior)
@ArgumentHint(value = ArgumentTrait.TABLE_AS_SET, name = "input_table") Row t,
// Scalar arguments require a data type either explicit or via reflection
@ArgumentHint(value = ArgumentTrait.SCALAR, name = "threshold") Integer threshold) {
int amount = t.getFieldAs("amount");
if (amount >= threshold) {
collect(amount);
}
}
}
Table arguments can declare a concrete data type (of either row or structured type) or accept any type of row in polymorphic fashion:
// Function with explicit table argument type of row
class MyPTF extends ProcessTableFunction<String> {
public void eval(Context ctx, @ArgumentHint(value = ArgumentTrait.TABLE_AS_SET, type = "ROW < s STRING >") Row t) {
TableSemantics semantics = ctx.tableSemanticsFor("t");
// Always returns "ROW < s STRING >"
semantics.dataType();
...
}
}
// Function with explicit table argument type of structured type "Customer"
class MyPTF extends ProcessTableFunction<String> {
public void eval(Context ctx, @ArgumentHint(value = ArgumentTrait.TABLE_AS_SET) Customer c) {
TableSemantics semantics = ctx.tableSemanticsFor("c");
// Always returns structured type of "Customer"
semantics.dataType();
...
}
}
// Function with polymorphic table argument
class MyPTF extends ProcessTableFunction<String> {
public void eval(Context ctx, @ArgumentHint(value = ArgumentTrait.TABLE_AS_SET) Row t) {
TableSemantics semantics = ctx.tableSemanticsFor("t");
// Always returns "ROW" but content depends on the table that is passed into the call
semantics.dataType();
...
}
}
Context
A ProcessTableFunction.Context can be added as a first argument to the eval() method for additional
information about the input tables and other services provided by the framework:
// a function that accesses the Context for reading the PARTITION BY columns and
// excluding them when building a result string
class ConcatNonKeysFunction extends ProcessTableFunction<String> {
public void eval(Context ctx, @ArgumentHint(ArgumentTrait.TABLE_AS_SET) Row inputTable) {
TableSemantics semantics = ctx.tableSemanticsFor("inputTable");
List<Integer> keys = Arrays.asList(semantics.partitionByColumns());
return IntStream.range(0, inputTable.getArity())
.filter(pos -> !keys.contains(pos))
.mapToObj(inputTable::getField)
.map(Object::toString)
.collect(Collectors.joining(", "));
}
}
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceContext that can be added as a first argument to the eval() method for additional information about the input tables and other services provided by the framework. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected final voidEmits an (implicit or explicit) output row.final FunctionKindgetKind()Returns the kind of function this definition describes.getTypeInference(DataTypeFactory typeFactory) Returns the logic for performing type inference of a call to this function definition.final voidsetCollector(org.apache.flink.util.Collector<T> collector) Internal use.Methods inherited from class org.apache.flink.table.functions.UserDefinedFunction
close, functionIdentifier, open, toStringMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.apache.flink.table.functions.FunctionDefinition
getRequirements, isDeterministic, supportsConstantFolding
-
Constructor Details
-
ProcessTableFunction
public ProcessTableFunction()
-
-
Method Details
-
setCollector
Internal use. Sets the current collector. -
collect
Emits an (implicit or explicit) output row.If null is emitted as an explicit row, it will be skipped by the runtime. For implicit rows, the row's field will be null.
- Parameters:
row- the output row
-
getKind
Description copied from interface:FunctionDefinitionReturns the kind of function this definition describes. -
getTypeInference
Description copied from class:UserDefinedFunctionReturns the logic for performing type inference of a call to this function definition.The type inference process is responsible for inferring unknown types of input arguments, validating input arguments, and producing result types. The type inference process happens independent of a function body. The output of the type inference is used to search for a corresponding runtime implementation.
Instances of type inference can be created by using
TypeInference.newBuilder().See
BuiltInFunctionDefinitionsfor concrete usage examples.The type inference for user-defined functions is automatically extracted using reflection. It does this by analyzing implementation methods such as
eval() or accumulate()and the generic parameters of a function class if present. If the reflective information is not sufficient, it can be supported and enriched withDataTypeHintandFunctionHintannotations.Note: Overriding this method is only recommended for advanced users. If a custom type inference is specified, it is the responsibility of the implementer to make sure that the output of the type inference process matches with the implementation method:
The implementation method must comply with each
DataType.getConversionClass()returned by the type inference. For example, ifDataTypes.TIMESTAMP(3).bridgedTo(java.sql.Timestamp.class)is an expected argument type, the method must accept a calleval(java.sql.Timestamp).Regular Java calling semantics (including type widening and autoboxing) are applied when calling an implementation method which means that the signature can be
eval(java.lang.Object).The runtime will take care of converting the data to the data format specified by the
DataType.getConversionClass()coming from the type inference logic.- Specified by:
getTypeInferencein interfaceFunctionDefinition- Specified by:
getTypeInferencein classUserDefinedFunction
-