Class HashJoinOperator

java.lang.Object
org.apache.flink.streaming.api.operators.AbstractStreamOperator<OUT>
org.apache.flink.table.runtime.operators.TableStreamOperator<org.apache.flink.table.data.RowData>
org.apache.flink.table.runtime.operators.join.HashJoinOperator
All Implemented Interfaces:
Serializable, org.apache.flink.api.common.state.CheckpointListener, org.apache.flink.streaming.api.operators.BoundedMultiInput, org.apache.flink.streaming.api.operators.InputSelectable, org.apache.flink.streaming.api.operators.KeyContext, org.apache.flink.streaming.api.operators.KeyContextHandler, org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>, org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.CheckpointedStreamOperator, org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData>, org.apache.flink.streaming.api.operators.YieldingOperator<org.apache.flink.table.data.RowData>

public abstract class HashJoinOperator extends TableStreamOperator<org.apache.flink.table.data.RowData> implements org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData>, org.apache.flink.streaming.api.operators.BoundedMultiInput, org.apache.flink.streaming.api.operators.InputSelectable
Hash join base operator.

The join operator implements the logic of a join operator at runtime. It uses a hybrid-hash-join internally to match the records with equal key. The build side of the hash is the first input of the match. It support all join type in HashJoinType.

Note: In order to solve the problem of data skew, or too much data in the hash table, the fallback to sort merge join mechanism is introduced here. If some partitions are spilled to disk more than three times in the process of hash join, it will fallback to sort merge join by default to improve stability. In the future, we will support more flexible adaptive hash join strategy, for example, in the process of building a hash table, if the size of data written to disk reaches a certain threshold, fallback to sort merge join in advance.

See Also:
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.flink.table.runtime.operators.TableStreamOperator

    TableStreamOperator.ContextImpl
  • Field Summary

    Fields inherited from class org.apache.flink.table.runtime.operators.TableStreamOperator

    ctx, currentWatermark

    Fields inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator

    combinedWatermark, config, lastRecordAttributes1, lastRecordAttributes2, latencyStats, metrics, output, processingTimeService, stateHandler, stateKeySelector1, stateKeySelector2, timeServiceManager
  • Method Summary

    Modifier and Type
    Method
    Description
    void
     
    void
    endInput(int inputId)
     
    abstract void
    join(RowIterator<org.apache.flink.table.data.binary.BinaryRowData> buildIter, org.apache.flink.table.data.RowData probeRow)
     
    newHashJoinOperator(HashJoinType type, boolean leftIsBuild, boolean compressionEnable, int compressionBlockSize, GeneratedJoinCondition condFuncCode, boolean reverseJoinFunction, boolean[] filterNullKeys, GeneratedProjection buildProjectionCode, GeneratedProjection probeProjectionCode, boolean tryDistinctBuildRow, int buildRowSize, long buildRowCount, long probeRowCount, org.apache.flink.table.types.logical.RowType keyType, SortMergeJoinFunction sortMergeJoinFunction)
     
    org.apache.flink.streaming.api.operators.InputSelection
     
    void
     
    void
    processElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element)
     
    void
    processElement2(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element)
     

    Methods inherited from class org.apache.flink.table.runtime.operators.TableStreamOperator

    computeMemorySize, processWatermark, useSplittableTimers

    Methods inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator

    beforeInitializeStateHandler, finish, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getStateKeySelector1, getStateKeySelector2, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, initializeState, isAsyncStateProcessingEnabled, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processRecordAttributes, processRecordAttributes1, processRecordAttributes2, processWatermark, processWatermark1, processWatermark1, processWatermark2, processWatermark2, processWatermarkStatus, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setMailboxExecutor, setProcessingTimeService, setup, snapshotState, snapshotState

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.flink.api.common.state.CheckpointListener

    notifyCheckpointAborted, notifyCheckpointComplete

    Methods inherited from interface org.apache.flink.streaming.api.operators.KeyContext

    getCurrentKey, setCurrentKey

    Methods inherited from interface org.apache.flink.streaming.api.operators.KeyContextHandler

    hasKeyContext

    Methods inherited from interface org.apache.flink.streaming.api.operators.StreamOperator

    finish, getMetricGroup, getOperatorAttributes, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotState

    Methods inherited from interface org.apache.flink.streaming.api.operators.TwoInputStreamOperator

    processLatencyMarker1, processLatencyMarker2, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark1, processWatermark2, processWatermark2, processWatermarkStatus1, processWatermarkStatus2
  • Method Details

    • open

      public void open() throws Exception
      Specified by:
      open in interface org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>
      Overrides:
      open in class TableStreamOperator<org.apache.flink.table.data.RowData>
      Throws:
      Exception
    • processElement1

      public void processElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) throws Exception
      Specified by:
      processElement1 in interface org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData>
      Throws:
      Exception
    • processElement2

      public void processElement2(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) throws Exception
      Specified by:
      processElement2 in interface org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData>
      Throws:
      Exception
    • nextSelection

      public org.apache.flink.streaming.api.operators.InputSelection nextSelection()
      Specified by:
      nextSelection in interface org.apache.flink.streaming.api.operators.InputSelectable
    • endInput

      public void endInput(int inputId) throws Exception
      Specified by:
      endInput in interface org.apache.flink.streaming.api.operators.BoundedMultiInput
      Throws:
      Exception
    • join

      public abstract void join(RowIterator<org.apache.flink.table.data.binary.BinaryRowData> buildIter, org.apache.flink.table.data.RowData probeRow) throws Exception
      Throws:
      Exception
    • close

      public void close() throws Exception
      Specified by:
      close in interface org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>
      Overrides:
      close in class org.apache.flink.streaming.api.operators.AbstractStreamOperator<org.apache.flink.table.data.RowData>
      Throws:
      Exception
    • newHashJoinOperator

      public static HashJoinOperator newHashJoinOperator(HashJoinType type, boolean leftIsBuild, boolean compressionEnable, int compressionBlockSize, GeneratedJoinCondition condFuncCode, boolean reverseJoinFunction, boolean[] filterNullKeys, GeneratedProjection buildProjectionCode, GeneratedProjection probeProjectionCode, boolean tryDistinctBuildRow, int buildRowSize, long buildRowCount, long probeRowCount, org.apache.flink.table.types.logical.RowType keyType, SortMergeJoinFunction sortMergeJoinFunction)