Class DeduplicateFunctionHelper

java.lang.Object
org.apache.flink.table.runtime.operators.deduplicate.utils.DeduplicateFunctionHelper

public class DeduplicateFunctionHelper extends Object
Utility for deduplicate function.

TODO utilize the respective helper classes that inherit from an abstract deduplicate function helper in each deduplicate function.

  • Method Summary

    Modifier and Type
    Method
    Description
    static void
    checkInsertOnly(org.apache.flink.table.data.RowData currentRow)
    check message should be insert only.
    static boolean
    isDuplicate(org.apache.flink.table.data.RowData preRow, org.apache.flink.table.data.RowData currentRow, int rowtimeIndex, boolean keepLastRow)
    Returns current row is duplicate row or not compared to previous row.
    static void
    processFirstRowOnProcTime(org.apache.flink.table.data.RowData currentRow, org.apache.flink.api.common.state.ValueState<Boolean> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out)
    Processes element to deduplicate on keys with process time semantic, sends current element if it is first row.
    static void
    processLastRowOnChangelog(org.apache.flink.table.data.RowData currentRow, boolean generateUpdateBefore, org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out, boolean isStateTtlEnabled, RecordEqualiser equaliser)
    Processes element to deduplicate on keys, sends current element as last row, retracts previous element if needed.
    static void
    processLastRowOnProcTime(org.apache.flink.table.data.RowData currentRow, boolean generateUpdateBefore, boolean generateInsert, org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out, boolean isStateTtlEnabled, RecordEqualiser equaliser)
    Processes element to deduplicate on keys with process time semantic, sends current element as last row, retracts previous element if needed.
    static void
    updateDeduplicateResult(boolean generateUpdateBefore, boolean generateInsert, org.apache.flink.table.data.RowData preRow, org.apache.flink.table.data.RowData currentRow, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out)
    Collect the updated result for duplicate row.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • processLastRowOnProcTime

      public static void processLastRowOnProcTime(org.apache.flink.table.data.RowData currentRow, boolean generateUpdateBefore, boolean generateInsert, org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out, boolean isStateTtlEnabled, RecordEqualiser equaliser) throws Exception
      Processes element to deduplicate on keys with process time semantic, sends current element as last row, retracts previous element if needed.
      Parameters:
      currentRow - latest row received by deduplicate function
      generateUpdateBefore - whether need to send UPDATE_BEFORE message for updates
      state - state of function, null if generateUpdateBefore is false
      out - underlying collector
      isStateTtlEnabled - whether state ttl is disabled
      equaliser - the record equaliser used to equal RowData.
      Throws:
      Exception
    • processLastRowOnChangelog

      public static void processLastRowOnChangelog(org.apache.flink.table.data.RowData currentRow, boolean generateUpdateBefore, org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out, boolean isStateTtlEnabled, RecordEqualiser equaliser) throws Exception
      Processes element to deduplicate on keys, sends current element as last row, retracts previous element if needed.

      Note: we don't support stateless mode yet. Because this is not safe for Kafka tombstone messages which doesn't contain full content. This can be a future improvement if the downstream (e.g. sink) doesn't require full content for DELETE messages.

      Parameters:
      currentRow - latest row received by deduplicate function
      generateUpdateBefore - whether need to send UPDATE_BEFORE message for updates
      state - state of function
      out - underlying collector
      Throws:
      Exception
    • processFirstRowOnProcTime

      public static void processFirstRowOnProcTime(org.apache.flink.table.data.RowData currentRow, org.apache.flink.api.common.state.ValueState<Boolean> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out) throws Exception
      Processes element to deduplicate on keys with process time semantic, sends current element if it is first row.
      Parameters:
      currentRow - latest row received by deduplicate function
      state - state of function
      out - underlying collector
      Throws:
      Exception
    • updateDeduplicateResult

      public static void updateDeduplicateResult(boolean generateUpdateBefore, boolean generateInsert, org.apache.flink.table.data.RowData preRow, org.apache.flink.table.data.RowData currentRow, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out)
      Collect the updated result for duplicate row.
      Parameters:
      generateUpdateBefore - flag to generate UPDATE_BEFORE message or not
      generateInsert - flag to generate INSERT message or not
      preRow - previous row under the key
      currentRow - current row under the key which is the duplicate row
      out - underlying collector
    • isDuplicate

      public static boolean isDuplicate(org.apache.flink.table.data.RowData preRow, org.apache.flink.table.data.RowData currentRow, int rowtimeIndex, boolean keepLastRow)
      Returns current row is duplicate row or not compared to previous row.
    • checkInsertOnly

      public static void checkInsertOnly(org.apache.flink.table.data.RowData currentRow)
      check message should be insert only.