object MergeScalarSubqueries extends Rule[LogicalPlan] with PredicateHelper
This rule tries to merge multiple non-correlated ScalarSubquerys to compute multiple scalar values once.
The process is the following:
- While traversing through the plan each ScalarSubquery plan is tried to merge into the cache
of already seen subquery plans. If merge is possible then cache is updated with the merged
subquery plan, if not then the new subquery plan is added to the cache.
During this first traversal each ScalarSubquery expression is replaced to a temporal
ScalarSubqueryReference reference pointing to its cached version.
The cache uses a flag to keep track of if a cache entry is a result of merging 2 or more
plans, or it is a plan that was seen only once.
Merged plans in the cache get a "Header", that contains the list of attributes form the scalar
return value of a merged subquery.
- A second traversal checks if there are merged subqueries in the cache and builds a WithCTE
node from these queries. The CTERelationDef nodes contain the merged subquery in the
following form:
Project(Seq(CreateNamedStruct(name1, attribute1, ...) AS mergedValue), mergedSubqueryPlan)
and the definitions are flagged that they host a subquery, that can return maximum one row.
During the second traversal ScalarSubqueryReference expressions that pont to a merged
subquery is either transformed to a GetStructField(ScalarSubquery(CTERelationRef(...)))
expression or restored to the original ScalarSubquery.
Eg. the following query:
SELECT (SELECT avg(a) FROM t), (SELECT sum(b) FROM t)
is optimized from:
Optimized Logical Plan
Project [scalar-subquery#242 [] AS scalarsubquery()#253, scalar-subquery#243 [] AS scalarsubquery()#254L] : :- Aggregate [avg(a#244) AS avg(a)#247] : : +- Project [a#244] : : +- Relation default.t[a#244,b#245] parquet : +- Aggregate [sum(a#251) AS sum(a)#250L] : +- Project [a#251] : +- Relation default.t[a#251,b#252] parquet +- OneRowRelation
to:
Optimized Logical Plan
Project [scalar-subquery#242 [].avg(a) AS scalarsubquery()#253, scalar-subquery#243 [].sum(a) AS scalarsubquery()#254L] : :- Project [named_struct(avg(a), avg(a)#247, sum(a), sum(a)#250L) AS mergedValue#260] : : +- Aggregate [avg(a#244) AS avg(a)#247, sum(a#244) AS sum(a)#250L] : : +- Project [a#244] : : +- Relation default.t[a#244,b#245] parquet : +- Project [named_struct(avg(a), avg(a)#247, sum(a), sum(a)#250L) AS mergedValue#260] : +- Aggregate [avg(a#244) AS avg(a)#247, sum(a#244) AS sum(a)#250L] : +- Project [a#244] : +- Relation default.t[a#244,b#245] parquet +- OneRowRelation
Physical Plan
*(1) Project [Subquery scalar-subquery#242, [id=#125].avg(a) AS scalarsubquery()#253, ReusedSubquery Subquery scalar-subquery#242, [id=#125].sum(a) AS scalarsubquery()#254L] : :- Subquery scalar-subquery#242, [id=#125] : : +- *(2) Project [named_struct(avg(a), avg(a)#247, sum(a), sum(a)#250L) AS mergedValue#260] : : +- *(2) HashAggregate(keys=[], functions=[avg(a#244), sum(a#244)], output=[avg(a)#247, sum(a)#250L]) : : +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#120] : : +- *(1) HashAggregate(keys=[], functions=[partial_avg(a#244), partial_sum(a#244)], output=[sum#262, count#263L, sum#264L]) : : +- *(1) ColumnarToRow : : +- FileScan parquet default.t[a#244] ... : +- ReusedSubquery Subquery scalar-subquery#242, [id=#125] +- *(1) Scan OneRowRelation[]
- Alphabetic
- By Inheritance
- MergeScalarSubqueries
- PredicateHelper
- AliasHelper
- Rule
- Logging
- SQLConfHelper
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
case class
Header(attributes: Seq[Attribute], plan: LogicalPlan, merged: Boolean) extends Product with Serializable
An item in the cache of merged scalar subqueries.
An item in the cache of merged scalar subqueries.
- attributes
Attributes that form the struct scalar return value of a merged subquery.
- plan
The plan of a merged scalar subquery.
- merged
A flag to identify if this item is the result of merging subqueries. Please note that
attributes.size == 1doesn't always mean that the plan is not merged as there can be subqueries that are different (checkIdenticalPlans is false) due to an extra Project node in one of them. In that caseattributes.sizeremains 1 after merging, but the merged flag becomes true.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
apply(plan: LogicalPlan): LogicalPlan
- Definition Classes
- MergeScalarSubqueries → Rule
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
buildBalancedPredicate(expressions: Seq[Expression], op: (Expression, Expression) ⇒ Expression): Expression
Builds a balanced output predicate in bottom up approach, by applying binary operator op pair by pair on input predicates exprs recursively.
Builds a balanced output predicate in bottom up approach, by applying binary operator op pair by pair on input predicates exprs recursively. Example: exprs = [a, b, c, d], op = And, returns (a And b) And (c And d) exprs = [a, b, c, d, e, f], op = And, returns ((a And b) And (c And d)) And (e And f)
- Attributes
- protected
- Definition Classes
- PredicateHelper
-
def
canEvaluate(expr: Expression, plan: LogicalPlan): Boolean
Returns true if
exprcan be evaluated using only the output ofplan.Returns true if
exprcan be evaluated using only the output ofplan. This method can be used to determine when it is acceptable to move expression evaluation within a query plan.For example consider a join between two relations R(a, b) and S(c, d).
-
canEvaluate(EqualTo(a,b), R)returnstrue-canEvaluate(EqualTo(a,c), R)returnsfalse-canEvaluate(Literal(1), R)returnstrueas literals CAN be evaluated on any plan- Attributes
- protected
- Definition Classes
- PredicateHelper
-
def
canEvaluateWithinJoin(expr: Expression): Boolean
Returns true iff
exprcould be evaluated as a condition within join.Returns true iff
exprcould be evaluated as a condition within join.- Attributes
- protected
- Definition Classes
- PredicateHelper
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
def
conf: SQLConf
The active config object within the current scope.
The active config object within the current scope. See SQLConf.get for more information.
- Definition Classes
- SQLConfHelper
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
extractPredicatesWithinOutputSet(condition: Expression, outputSet: AttributeSet): Option[Expression]
Returns a filter that its reference is a subset of
outputSetand it contains the maximum constraints fromcondition.Returns a filter that its reference is a subset of
outputSetand it contains the maximum constraints fromcondition. This is used for predicate pushdown. When there is no such filter,Noneis returned.- Attributes
- protected
- Definition Classes
- PredicateHelper
-
def
findExpressionAndTrackLineageDown(exp: Expression, plan: LogicalPlan): Option[(Expression, LogicalPlan)]
Find the origin of where the input references of expression exp were scanned in the tree of plan, and if they originate from a single leaf node.
Find the origin of where the input references of expression exp were scanned in the tree of plan, and if they originate from a single leaf node. Returns optional tuple with Expression, undoing any projections and aliasing that has been done along the way from plan to origin, and the origin LeafNode plan from which all the exp
- Definition Classes
- PredicateHelper
-
def
getAliasMap(exprs: Seq[NamedExpression]): AttributeMap[Alias]
- Attributes
- protected
- Definition Classes
- AliasHelper
-
def
getAliasMap(plan: Aggregate): AttributeMap[Alias]
- Attributes
- protected
- Definition Classes
- AliasHelper
-
def
getAliasMap(plan: Project): AttributeMap[Alias]
- Attributes
- protected
- Definition Classes
- AliasHelper
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isLikelySelective(e: Expression): Boolean
Returns whether an expression is likely to be selective
Returns whether an expression is likely to be selective
- Definition Classes
- PredicateHelper
-
def
isNullIntolerant(expr: Expression): Boolean
- Attributes
- protected
- Definition Classes
- PredicateHelper
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
outputWithNullability(output: Seq[Attribute], nonNullAttrExprIds: Seq[ExprId]): Seq[Attribute]
- Attributes
- protected
- Definition Classes
- PredicateHelper
-
def
replaceAlias(expr: Expression, aliasMap: AttributeMap[Alias]): Expression
Replace all attributes, that reference an alias, with the aliased expression
Replace all attributes, that reference an alias, with the aliased expression
- Attributes
- protected
- Definition Classes
- AliasHelper
-
def
replaceAliasButKeepName(expr: NamedExpression, aliasMap: AttributeMap[Alias]): NamedExpression
Replace all attributes, that reference an alias, with the aliased expression, but keep the name of the outermost attribute.
Replace all attributes, that reference an alias, with the aliased expression, but keep the name of the outermost attribute.
- Attributes
- protected
- Definition Classes
- AliasHelper
-
lazy val
ruleId: RuleId
- Attributes
- protected
- Definition Classes
- Rule
-
val
ruleName: String
Name for this rule, automatically inferred based on class name.
Name for this rule, automatically inferred based on class name.
- Definition Classes
- Rule
-
def
splitConjunctivePredicates(condition: Expression): Seq[Expression]
- Attributes
- protected
- Definition Classes
- PredicateHelper
-
def
splitDisjunctivePredicates(condition: Expression): Seq[Expression]
- Attributes
- protected
- Definition Classes
- PredicateHelper
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
trimAliases(e: Expression): Expression
- Attributes
- protected
- Definition Classes
- AliasHelper
-
def
trimNonTopLevelAliases[T <: Expression](e: T): T
- Attributes
- protected
- Definition Classes
- AliasHelper
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated