Package org.apache.polaris.persistence.nosql.maintenance.api
Types of maintenance operations include:
- Purging unreferenced objects and references within a catalog
- Purging whole catalogs that are marked to be purged
- Purging whole realms that are marked to be purged
Discussion
Not all databases offer support to perform "prefix key" deletions, which are, for example,
necessary to purge a whole realm. Some databases do support "deleting a huge number of rows".
Some have another API for prefix-key deletions, for example, Google's BigTable
dropRowRange on the table-admin-client. Relational databases may require different
configurations with respect to isolation level to run those maintenance operations in a "better"
way. Some databases do not support such "prefix-key deletions" at all, for example, Apache
Cassandra or RocksDb or Amazon's DynamoDb.
Backend implementations
therefore expose whether it can leverage "prefix-key deletions" when one or more realms are to be
purged. If a Backend does not support "prefix-key deletions", the whole repository has to
be scanned.
Purging unreferenced data
The other maintenance operations like purging a catalog or unreferenced objects or references a two-step approach that works even for large multi-tenant setups:
- Memoize the current timestamp, subtract some amount to account for expected wall-clock drifts.
- Identify all objects and references that must be retained, memoize those in a probabilistic data structure (bloom filter). See below.
- Scan the whole database to identify the objects and references that were not identified as being referenced in the previous step.
- Delete the unreferenced objects and references if, and only if, their
createdAtMicros()timestamp is less than the timestamp memoized in the first step.
Identifying objects and references
Implementations of
invalid reference
@ApplicationScopedPerRealmRetainedIdentifier are called to identify the references and objects that have to be
retained for a realm.
Implementations of
invalid reference
@ApplicationScopedObjTypeRetainedIdentifier are called for each identified object of the requested object type.
Realm status
The maintenance service implementation will check the current status of the realm to retain and to purge, that the status is valid for being retained (valid: ACTIVE and INACTIVE) and being purged (valid: PURGING). Realms that have been asked to be purged and for which no data has been encountered will be state-transitioned to PURGED.
System realm "::system::"
The system realm is maintained like every other realm.
Future export use cases (TBD/TBC)
These can be useful in a hosted and multi-tenant SaaS environment, when an export of the data for a particular realm is requested.
- Export live/referenced objects, filtered by realm. A possible implementation would hook
into the implementation of
PerRealmRetainedIdentifiervia a delegate overPersistence. The actual approach and implementation is therefore out of the scope of the maintenance service. - Low-level export, filtered by realm. This one is different from the one above, as it would export references and all object-parts, in contrast to fully materialized objects. A possible implementation would hook into the scanning-part of the maintenance service implementation.
-
InterfacesClassDescriptionMaintenance service configuration.Configures a maintenance run.