Class BinaryHashTable
java.lang.Object
org.apache.flink.table.runtime.hashtable.BaseHybridHashTable
org.apache.flink.table.runtime.hashtable.BinaryHashTable
- All Implemented Interfaces:
org.apache.flink.core.memory.MemorySegmentSource,MemorySegmentPool
An implementation of a Hybrid Hash Join. The join starts operating in memory and gradually starts
spilling contents to disk, when the memory is not sufficient. It does not need to know a priority
how large the input will be.
The design of this class follows in many parts the design presented in "Hash joins and hash teams in Microsoft SQL Server", by Goetz Graefe et al. In its current state, the implementation lacks features like dynamic role reversal, partition tuning, or histogram guided partitioning.
-
Field Summary
Fields inherited from class org.apache.flink.table.runtime.hashtable.BaseHybridHashTable
buildRowCount, buildSpillRetBufferNumbers, buildSpillReturnBuffers, closed, compressionBlockSize, compressionCodecFactory, compressionEnabled, currentEnumerator, currentRecursionDepth, currentSpilledBuildSide, currentSpilledProbeSide, initPartitionFanOut, internalPool, ioManager, LOG, MAX_NUM_PARTITIONS, MAX_RECURSION_DEPTH, numSpillFiles, segmentSize, segmentSizeBits, segmentSizeMask, spillInBytes, totalNumBuffers, tryDistinctBuildRow -
Constructor Summary
ConstructorsConstructorDescriptionBinaryHashTable(Object owner, boolean compressionEnabled, int compressionBlockSize, AbstractRowDataSerializer buildSideSerializer, AbstractRowDataSerializer probeSideSerializer, Projection<org.apache.flink.table.data.RowData, org.apache.flink.table.data.binary.BinaryRowData> buildSideProjection, Projection<org.apache.flink.table.data.RowData, org.apache.flink.table.data.binary.BinaryRowData> probeSideProjection, org.apache.flink.runtime.memory.MemoryManager memManager, long reservedMemorySize, org.apache.flink.runtime.io.disk.iomanager.IOManager ioManager, int avgRecordLen, long buildRowCount, boolean useBloomFilters, HashJoinType type, JoinCondition condFunc, boolean reverseJoin, boolean[] filterNulls, boolean tryDistinctBuildRow) -
Method Summary
Modifier and TypeMethodDescriptionvoidThis method clears all partitions currently residing (partially) in memory.voidendBuild()End build phase.RowIterator<org.apache.flink.table.data.binary.BinaryRowData>org.apache.flink.table.data.RowDatabooleanNext record from rebuilt spilled partition or build side outer partition.voidputBuildRow(org.apache.flink.table.data.RowData row) Put a build side row to hash table.protected intSelects a partition and spills it.booleantryProbe(org.apache.flink.table.data.RowData record) Find matched build side rows for a probe row.Methods inherited from class org.apache.flink.table.runtime.hashtable.BaseHybridHashTable
close, createInputView, ensureNumBuffersReturned, free, freeCurrent, freePages, getNextBuffer, getNextBuffers, getNotNullNextBuffer, getNumSpillFiles, getSpillInBytes, getUsedMemoryInBytes, hash, maxInitBufferOfBucketArea, maxNumPartition, nextSegment, pageSize, readAllBuffers, releaseMemoryCacheForSMJ, remainBuffers, returnAll, returnPage
-
Constructor Details
-
BinaryHashTable
public BinaryHashTable(Object owner, boolean compressionEnabled, int compressionBlockSize, AbstractRowDataSerializer buildSideSerializer, AbstractRowDataSerializer probeSideSerializer, Projection<org.apache.flink.table.data.RowData, org.apache.flink.table.data.binary.BinaryRowData> buildSideProjection, Projection<org.apache.flink.table.data.RowData, org.apache.flink.table.data.binary.BinaryRowData> probeSideProjection, org.apache.flink.runtime.memory.MemoryManager memManager, long reservedMemorySize, org.apache.flink.runtime.io.disk.iomanager.IOManager ioManager, int avgRecordLen, long buildRowCount, boolean useBloomFilters, HashJoinType type, JoinCondition condFunc, boolean reverseJoin, boolean[] filterNulls, boolean tryDistinctBuildRow)
-
-
Method Details
-
putBuildRow
Put a build side row to hash table.- Throws:
IOException
-
endBuild
End build phase.- Throws:
IOException
-
tryProbe
Find matched build side rows for a probe row.- Returns:
- return false if the target partition has spilled, we will spill this probe row too. The row will be re-match in rebuild phase.
- Throws:
IOException
-
nextMatching
Next record from rebuilt spilled partition or build side outer partition.- Throws:
IOException
-
getCurrentProbeRow
public org.apache.flink.table.data.RowData getCurrentProbeRow() -
getBuildSideIterator
-
clearPartitions
public void clearPartitions()This method clears all partitions currently residing (partially) in memory. It releases all memory and deletes all spilled partitions.This method is intended for a hard cleanup in the case that the join is aborted.
- Specified by:
clearPartitionsin classBaseHybridHashTable
-
spillPartition
Selects a partition and spills it. The number of the spilled partition is returned.- Specified by:
spillPartitionin classBaseHybridHashTable- Returns:
- The number of the spilled partition.
- Throws:
IOException
-
getPartitionsPendingForSMJ
-
getSpilledPartitionBuildSideIter
- Throws:
IOException
-
getSpilledPartitionProbeSideIter
- Throws:
IOException
-