Class DeduplicateFunctionHelper
java.lang.Object
org.apache.flink.table.runtime.operators.deduplicate.utils.DeduplicateFunctionHelper
Utility for deduplicate function.
TODO utilize the respective helper classes that inherit from an abstract deduplicate function helper in each deduplicate function.
-
Method Summary
Modifier and TypeMethodDescriptionstatic voidcheckInsertOnly(org.apache.flink.table.data.RowData currentRow) check message should be insert only.static booleanisDuplicate(org.apache.flink.table.data.RowData preRow, org.apache.flink.table.data.RowData currentRow, int rowtimeIndex, boolean keepLastRow) Returns current row is duplicate row or not compared to previous row.static voidprocessFirstRowOnProcTime(org.apache.flink.table.data.RowData currentRow, org.apache.flink.api.common.state.ValueState<Boolean> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out) Processes element to deduplicate on keys with process time semantic, sends current element if it is first row.static voidprocessLastRowOnChangelog(org.apache.flink.table.data.RowData currentRow, boolean generateUpdateBefore, org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out, boolean isStateTtlEnabled, RecordEqualiser equaliser) Processes element to deduplicate on keys, sends current element as last row, retracts previous element if needed.static voidprocessLastRowOnProcTime(org.apache.flink.table.data.RowData currentRow, boolean generateUpdateBefore, boolean generateInsert, org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out, boolean isStateTtlEnabled, RecordEqualiser equaliser) Processes element to deduplicate on keys with process time semantic, sends current element as last row, retracts previous element if needed.static voidupdateDeduplicateResult(boolean generateUpdateBefore, boolean generateInsert, org.apache.flink.table.data.RowData preRow, org.apache.flink.table.data.RowData currentRow, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out) Collect the updated result for duplicate row.
-
Method Details
-
processLastRowOnProcTime
public static void processLastRowOnProcTime(org.apache.flink.table.data.RowData currentRow, boolean generateUpdateBefore, boolean generateInsert, org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out, boolean isStateTtlEnabled, RecordEqualiser equaliser) throws Exception Processes element to deduplicate on keys with process time semantic, sends current element as last row, retracts previous element if needed.- Parameters:
currentRow- latest row received by deduplicate functiongenerateUpdateBefore- whether need to send UPDATE_BEFORE message for updatesstate- state of function, null if generateUpdateBefore is falseout- underlying collectorisStateTtlEnabled- whether state ttl is disabledequaliser- the record equaliser used to equal RowData.- Throws:
Exception
-
processLastRowOnChangelog
public static void processLastRowOnChangelog(org.apache.flink.table.data.RowData currentRow, boolean generateUpdateBefore, org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out, boolean isStateTtlEnabled, RecordEqualiser equaliser) throws Exception Processes element to deduplicate on keys, sends current element as last row, retracts previous element if needed.Note: we don't support stateless mode yet. Because this is not safe for Kafka tombstone messages which doesn't contain full content. This can be a future improvement if the downstream (e.g. sink) doesn't require full content for DELETE messages.
- Parameters:
currentRow- latest row received by deduplicate functiongenerateUpdateBefore- whether need to send UPDATE_BEFORE message for updatesstate- state of functionout- underlying collector- Throws:
Exception
-
processFirstRowOnProcTime
public static void processFirstRowOnProcTime(org.apache.flink.table.data.RowData currentRow, org.apache.flink.api.common.state.ValueState<Boolean> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out) throws Exception Processes element to deduplicate on keys with process time semantic, sends current element if it is first row.- Parameters:
currentRow- latest row received by deduplicate functionstate- state of functionout- underlying collector- Throws:
Exception
-
updateDeduplicateResult
public static void updateDeduplicateResult(boolean generateUpdateBefore, boolean generateInsert, org.apache.flink.table.data.RowData preRow, org.apache.flink.table.data.RowData currentRow, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out) Collect the updated result for duplicate row.- Parameters:
generateUpdateBefore- flag to generate UPDATE_BEFORE message or notgenerateInsert- flag to generate INSERT message or notpreRow- previous row under the keycurrentRow- current row under the key which is the duplicate rowout- underlying collector
-
isDuplicate
public static boolean isDuplicate(org.apache.flink.table.data.RowData preRow, org.apache.flink.table.data.RowData currentRow, int rowtimeIndex, boolean keepLastRow) Returns current row is duplicate row or not compared to previous row. -
checkInsertOnly
public static void checkInsertOnly(org.apache.flink.table.data.RowData currentRow) check message should be insert only.
-