Interface SupportsPartitionPushDown
ScanTableSource.
Partitions split the data stored in an external system into smaller portions that are
identified by one or more string-based partition keys. A single partition is represented as a
Map < String, String > which maps each partition key to a partition value. Partition keys
and their order is defined by the catalog table.
For example, data can be partitioned by region and within a region partitioned by month. The order of the partition keys (in the example: first by region then by month) is defined by the catalog table. A list of partitions could be:
List(
['region'='europe', 'month'='2020-01'],
['region'='europe', 'month'='2020-02'],
['region'='asia', 'month'='2020-01'],
['region'='asia', 'month'='2020-02']
)
By default, if this interface is not implemented, the data is read entirely with a subsequent filter operation after the source.
For efficiency, the planner can pass the number of required partitions and a source must
exclude those partitions from reading (including reading the metadata). See applyPartitions(List).
By default, the list of all partitions is queried from the catalog if necessary. However,
depending on the external system, it might be necessary to query the list of partitions in a
connector-specific way instead of using the catalog information. See listPartitions().
Note: After partitions are pushed into the source, the runtime will not perform a subsequent filter operation for partition keys.
-
Method Summary
-
Method Details
-
listPartitions
Returns a list of all partitions that a source can read if available.A single partition maps each partition key to a partition value.
If
Optional.empty()is returned, the list of partitions is queried from the catalog. -
applyPartitions
Provides a list of remaining partitions. After those partitions are applied, a source must not read the data of other partitions during runtime.See the documentation of
SupportsPartitionPushDownfor more information.
-