package python

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class BatchQueue extends AutoCloseable with Arm

    A simple queue that holds the pending batches that need to line up with and combined with batches coming back from python

  2. class BufferToStreamWriter extends HostBufferConsumer with Arm
  3. case class GpuAggregateInPandasExec(gpuGroupingExpressions: Seq[NamedExpression], udfExpressions: Seq[GpuPythonUDF], resultExpressions: Seq[NamedExpression], child: SparkPlan)(cpuGroupingExpressions: Seq[NamedExpression]) extends SparkPlan with ShimUnaryExecNode with GpuPythonExecBase with Product with Serializable

    Physical node for aggregation with group aggregate Pandas UDF.

    Physical node for aggregation with group aggregate Pandas UDF.

    This plan works by sending the necessary (projected) input grouped data as Arrow record batches to the Python worker, the Python worker invokes the UDF and sends the results to the executor. Finally the executor evaluates any post-aggregation expressions and join the result with the grouped key.

    This node aims at accelerating the data transfer between JVM and Python for GPU pipeline, and scheduling GPU resources for its Python processes.

  4. class GpuAggregateInPandasExecMeta extends SparkPlanMeta[AggregateInPandasExec]
  5. case class GpuArrowEvalPythonExec(udfs: Seq[GpuPythonUDF], resultAttrs: Seq[Attribute], child: SparkPlan, evalType: Int) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable

    A physical plan that evaluates a GpuPythonUDF.

    A physical plan that evaluates a GpuPythonUDF. The transformation of the data to arrow happens on the GPU (practically a noop), But execution of the UDFs are on the CPU.

  6. class GpuArrowPythonRunner extends ShimBasePythonRunner[ColumnarBatch, ColumnarBatch] with GpuPythonArrowOutput

    Similar to PythonUDFRunner, but exchange data with Python worker via Arrow stream.

  7. case class GpuFlatMapCoGroupsInPandasExec(leftGroup: Seq[Attribute], rightGroup: Seq[Attribute], func: Expression, output: Seq[Attribute], left: SparkPlan, right: SparkPlan) extends SparkPlan with ShimBinaryExecNode with GpuExec with Product with Serializable
  8. class GpuFlatMapCoGroupsInPandasExecMeta extends SparkPlanMeta[FlatMapCoGroupsInPandasExec]
  9. case class GpuMapInPandasExec(func: Expression, output: Seq[Attribute], child: SparkPlan) extends SparkPlan with ShimUnaryExecNode with GpuPythonExecBase with Product with Serializable
  10. class GpuMapInPandasExecMeta extends SparkPlanMeta[MapInPandasExec]
  11. trait GpuPythonArrowOutput extends Arm

    A trait that can be mixed-in with GpuArrowPythonRunner.

    A trait that can be mixed-in with GpuArrowPythonRunner. It implements the logic from Python (Arrow) to GPU/JVM (ColumnarBatch).

  12. trait GpuPythonExecBase extends SparkPlan with GpuExec
  13. case class GpuPythonUDF(name: String, func: PythonFunction, dataType: DataType, children: Seq[Expression], evalType: Int, udfDeterministic: Boolean, resultId: ExprId = NamedExpression.newExprId) extends Expression with GpuUnevaluable with NonSQLExpression with UserDefinedExpression with GpuAggregateWindowFunction with Product with Serializable

    A serialized version of a Python lambda function.

    A serialized version of a Python lambda function. This is a special expression, which needs a dedicated physical operator to execute it, and thus can't be pushed down to data sources.

  14. trait GpuWindowInPandasExecBase extends SparkPlan with ShimUnaryExecNode with GpuExec
  15. abstract class GpuWindowInPandasExecMetaBase extends SparkPlanMeta[WindowInPandasExec]
  16. case class GroupArgs(dedupAttrs: Seq[Attribute], argOffsets: Array[Int], groupingOffsets: Seq[Int]) extends Product with Serializable

    A helper class to pack the group related items for the Python input.

    A helper class to pack the group related items for the Python input.

    dedupAttrs

    the deduplicated attributes for the output of a Spark plan.

    argOffsets

    the argument offsets which will be used to distinguish grouping columns and data columns by the Python workers.

    groupingOffsets

    the grouping offsets(aka column indices) in the deduplicated attributes.

  17. class GroupingIterator extends Iterator[ColumnarBatch] with Arm

    This iterator will group the rows in the incoming batches per the window "partitionBy" specification to make sure each group goes into only one batch, and each batch contains only one group data.

  18. class RebatchingRoundoffIterator extends Iterator[ColumnarBatch] with Arm

    This iterator will round incoming batches to multiples of targetRoundoff rows, if possible.

    This iterator will round incoming batches to multiples of targetRoundoff rows, if possible. The last batch might not be a multiple of it.

  19. class StreamToBufferProvider extends HostBufferProvider

Value Members

  1. object GpuPythonHelper extends Logging
  2. object GpuPythonUDF extends Serializable

    Helper functions for GpuPythonUDF

Ungrouped