Class HashJoinOperator
- All Implemented Interfaces:
Serializable,org.apache.flink.api.common.state.CheckpointListener,org.apache.flink.streaming.api.operators.BoundedMultiInput,org.apache.flink.streaming.api.operators.InputSelectable,org.apache.flink.streaming.api.operators.KeyContext,org.apache.flink.streaming.api.operators.KeyContextHandler,org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>,org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.CheckpointedStreamOperator,org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,,org.apache.flink.table.data.RowData, org.apache.flink.table.data.RowData> org.apache.flink.streaming.api.operators.YieldingOperator<org.apache.flink.table.data.RowData>
The join operator implements the logic of a join operator at runtime. It uses a
hybrid-hash-join internally to match the records with equal key. The build side of the hash is
the first input of the match. It support all join type in HashJoinType.
Note: In order to solve the problem of data skew, or too much data in the hash table, the fallback to sort merge join mechanism is introduced here. If some partitions are spilled to disk more than three times in the process of hash join, it will fallback to sort merge join by default to improve stability. In the future, we will support more flexible adaptive hash join strategy, for example, in the process of building a hash table, if the size of data written to disk reaches a certain threshold, fallback to sort merge join in advance.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.flink.table.runtime.operators.TableStreamOperator
TableStreamOperator.ContextImpl -
Field Summary
Fields inherited from class org.apache.flink.table.runtime.operators.TableStreamOperator
ctx, currentWatermarkFields inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator
combinedWatermark, config, lastRecordAttributes1, lastRecordAttributes2, latencyStats, metrics, output, processingTimeService, stateHandler, stateKeySelector1, stateKeySelector2, timeServiceManager -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()voidendInput(int inputId) abstract voidjoin(RowIterator<org.apache.flink.table.data.binary.BinaryRowData> buildIter, org.apache.flink.table.data.RowData probeRow) static HashJoinOperatornewHashJoinOperator(HashJoinType type, boolean leftIsBuild, boolean compressionEnable, int compressionBlockSize, GeneratedJoinCondition condFuncCode, boolean reverseJoinFunction, boolean[] filterNullKeys, GeneratedProjection buildProjectionCode, GeneratedProjection probeProjectionCode, boolean tryDistinctBuildRow, int buildRowSize, long buildRowCount, long probeRowCount, org.apache.flink.table.types.logical.RowType keyType, SortMergeJoinFunction sortMergeJoinFunction) org.apache.flink.streaming.api.operators.InputSelectionvoidopen()voidprocessElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) voidprocessElement2(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) Methods inherited from class org.apache.flink.table.runtime.operators.TableStreamOperator
computeMemorySize, processWatermark, useSplittableTimersMethods inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator
beforeInitializeStateHandler, finish, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getStateKeySelector1, getStateKeySelector2, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, initializeState, isAsyncStateProcessingEnabled, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processRecordAttributes, processRecordAttributes1, processRecordAttributes2, processWatermark, processWatermark1, processWatermark1, processWatermark2, processWatermark2, processWatermarkStatus, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setMailboxExecutor, setProcessingTimeService, setup, snapshotState, snapshotStateMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.flink.api.common.state.CheckpointListener
notifyCheckpointAborted, notifyCheckpointCompleteMethods inherited from interface org.apache.flink.streaming.api.operators.KeyContext
getCurrentKey, setCurrentKeyMethods inherited from interface org.apache.flink.streaming.api.operators.KeyContextHandler
hasKeyContextMethods inherited from interface org.apache.flink.streaming.api.operators.StreamOperator
finish, getMetricGroup, getOperatorAttributes, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotStateMethods inherited from interface org.apache.flink.streaming.api.operators.TwoInputStreamOperator
processLatencyMarker1, processLatencyMarker2, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark1, processWatermark2, processWatermark2, processWatermarkStatus1, processWatermarkStatus2
-
Method Details
-
open
- Specified by:
openin interfaceorg.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>- Overrides:
openin classTableStreamOperator<org.apache.flink.table.data.RowData>- Throws:
Exception
-
processElement1
public void processElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) throws Exception - Specified by:
processElement1in interfaceorg.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData, org.apache.flink.table.data.RowData> - Throws:
Exception
-
processElement2
public void processElement2(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) throws Exception - Specified by:
processElement2in interfaceorg.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData, org.apache.flink.table.data.RowData> - Throws:
Exception
-
nextSelection
public org.apache.flink.streaming.api.operators.InputSelection nextSelection()- Specified by:
nextSelectionin interfaceorg.apache.flink.streaming.api.operators.InputSelectable
-
endInput
- Specified by:
endInputin interfaceorg.apache.flink.streaming.api.operators.BoundedMultiInput- Throws:
Exception
-
join
public abstract void join(RowIterator<org.apache.flink.table.data.binary.BinaryRowData> buildIter, org.apache.flink.table.data.RowData probeRow) throws Exception - Throws:
Exception
-
close
- Specified by:
closein interfaceorg.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>- Overrides:
closein classorg.apache.flink.streaming.api.operators.AbstractStreamOperator<org.apache.flink.table.data.RowData>- Throws:
Exception
-
newHashJoinOperator
public static HashJoinOperator newHashJoinOperator(HashJoinType type, boolean leftIsBuild, boolean compressionEnable, int compressionBlockSize, GeneratedJoinCondition condFuncCode, boolean reverseJoinFunction, boolean[] filterNullKeys, GeneratedProjection buildProjectionCode, GeneratedProjection probeProjectionCode, boolean tryDistinctBuildRow, int buildRowSize, long buildRowCount, long probeRowCount, org.apache.flink.table.types.logical.RowType keyType, SortMergeJoinFunction sortMergeJoinFunction)
-